Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43082

Build Index got stuck

    XMLWordPrintable

    Details

      Description

      Summary
      Build index ('gsi-223') got stuck. I waited for almost 2 hours after cbcollect but still the status had not changed

      Steps to Repro:
      1. Create initial nodes
      2020-12-03 04:49:07,898 | test | INFO | pool-1-thread-9 | [table_view:display:72] Rebalance Overview
      ---------------------------++-------------

      Nodes Services Status

      ---------------------------++-------------

      172.23.105.175 kv Cluster node
      172.23.106.233 ['n1ql'] <--- IN —
      172.23.106.236 ['n1ql'] <--- IN —
      172.23.106.238 ['kv,index'] <--- IN —
      172.23.106.250 ['kv,index'] <--- IN —
      172.23.106.251 ['kv,index'] <--- IN —

      ---------------------------++-------------
      2. Create 1000 collections with 1 item each
      ---------------------------------------++-----------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      ---------------------------------------++-----------------------------------------------------------

      4tkxCz8mOx_3oouJ_qJ-23-348000 couchbase 2 none 0 500 419430400 0 565201977
      EHVCoriHb-23-220000 couchbase 2 none 0 500 419430400 0 673038089

      ---------------------------------------++-----------------------------------------------------------
      3. Create 333 gsi indexes with 2 replicas each

      2020-12-03 04:53:05,470 | test | INFO | MainThread | [Collections:create_indexes_and_initialize_queries:145] Creating indexes with defer build

      4. Build indexes

      2020-12-03 05:39:26,381 | test | INFO | MainThread | [Collections:wait_for_indexes_to_go_online:106] Waiting for indexes to go online

      while building 'gsi-223' index, it got stuck. 

      Observations 
      from indexer.log (not sure if this is relevant)

      2020-12-03T05:39:04.177-08:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.106.250:9100.  Error = read tcp 172.23.106.251:54770->172.23.106.250:9100: use of closed network connection. Kill Pipe.
      2020-12-03T05:39:10.682-08:00 [Error] DATP[->dataport ":9103"] Accept() Error: accept tcp4 0.0.0.0:9103: use of closed network connection
      2020-12-03T05:39:10.682-08:00 [Error] receiving packet: read tcp4 172.23.106.251:9103->172.23.105.175:34024: use of closed network connection
      2020-12-03T05:39:10.682-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.105.175:34024" exit: read tcp4 172.23.106.251:9103->172.23.105.175:34024: use of closed network connection
      2020-12-03T05:39:10.699-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.106.238:9999 Topic INIT_STREAM_TOPIC_f61819a2ff21d3d08d62eafa926ba7ac KeyspaceIds [4tkxCz8mOx_3oouJ_qJ-23-348000:--23-367000:P6Dklm-23-367000]. Err genServer.closed
      

       

       

        Attachments

        1. bucket-1.png
          bucket-1.png
          122 kB
        2. bucket-2.png
          bucket-2.png
          121 kB
        3. bucket-3.png
          bucket-3.png
          109 kB
        4. Screenshot 2020-12-03 at 9.24.10 PM.png
          Screenshot 2020-12-03 at 9.24.10 PM.png
          101 kB
        5. Screenshot 2020-12-04 at 9.27.44 PM.png
          Screenshot 2020-12-04 at 9.27.44 PM.png
          108 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          sumedh.basarkod Sumedh Basarkod added a comment -
          Show
          sumedh.basarkod Sumedh Basarkod added a comment - Reproduced on the second run as well.    http://supportal.couchbase.com/snapshot/2735d89938e7f12b8643e022057d9f60::0
          Hide
          varun.velamuri Varun Velamuri added a comment -

          Daniel Owen, This seems to be a KV issue. DCP backfilling has been suspended due to high memory because of which memcached was not sending any mutations to projector and this caused index build to remain stuck for long time.

          Taking node 172.23.106.238 as example, Projector asked memcached to open streams for collection: 4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000

          Projector opened connections with memcached for this connection with UUID: 11539340182236788755 for the below vbuckets

          2020-12-03T05:39:27.085-08:00 [Info] FEED[<=>INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5(127.0.0.1:8091)] ##12 vbmap {default,4tkxCz8mOx_3oouJ_qJ-23-348000} - [256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511]

          Even after 20 minutes, projector did not get mutations for some buckets - even though stream being messages were received.

          Taking vb:511 as example, stream being for this vb is received at: 2020-12-03T05:39:27.862-08:00

          2020-12-03T05:39:27.862-08:00 [Info] FEED[<=>INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5(127.0.0.1:8091)] ##12 backch *projector.controlStreamRequest {controlStreamRequest, SUCCESS, 4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000, 511, 459162c887c3, 0, ##12, 17356619170941278983}

          Even after 20 minutes, projector did not receive the mutations for vb:511 (The high seqno of this vb is "0")

          2020-12-03T06:00:08.891-08:00 [Info] KVDT[<-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000<-127.0.0.1:8091 #INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5] ##12 vbseqnos: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 998 998 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

          Looking at memcached.log for vb:511, the backfill seems to have been scheduled but not completed

           

          2020-12-03T05:39:27.861810-08:00 INFO 3647: (4tkxCz8mOx_3oouJ_qJ-23-348000) DCP (Producer) eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3 - (vb:511) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:0, filter:1 sid:none
          2020-12-03T05:39:27.861823-08:00 INFO 3647: (4tkxCz8mOx_3oouJ_qJ-23-348000) DCP (Producer) eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3 - (vb:511) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3" backfill:true, seqno:999
          2020-12-03T05:39:27.861846-08:00 INFO (4tkxCz8mOx_3oouJ_qJ-23-348000) Backfill for eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3 vb:511 is pending
          2020-12-03T05:39:27.861860-08:00 INFO 3647: (4tkxCz8mOx_3oouJ_qJ-23-348000) DCP (Producer) eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3 - (vb:511) Scheduling backfill from 1 to 998, reschedule flag : False
          

          From stats.log, the stats of vb:511 seems to be in back-filling

          eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_buffer_bytes: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_buffer_items: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_disk_items: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_mem_items: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_sent: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_cursor_registered: true
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_end_seqno: 18446744073709551615
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_flags: 16
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_items_ready: false
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_read_seqno: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_read_seqno_unsnapshotted: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_sent_seqno: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_sent_snap_end_seqno: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_memory_phase: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_opaque: 4278260224
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_readyQ_items: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_ready_queue_memory: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_snap_end_seqno: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_snap_start_seqno: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_start_seqno: 0
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_state: backfilling
           eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_vb_uuid: 0

          From 2020-12-03T05:39:27.877559-08:00, memcached.log is flooded with below message 

           

          2020-12-03T05:39:27.877559-08:00 INFO (4tkxCz8mOx_3oouJ_qJ-23-348000) DCP backfilling task temporarily suspended because the current memory usage is too high
          

          Looking at the system memory, there seems to be lot of memory available on the node

          Assigning it to KV team for further analysis

          Show
          varun.velamuri Varun Velamuri added a comment - Daniel Owen , This seems to be a KV issue. DCP backfilling has been suspended due to high memory because of which memcached was not sending any mutations to projector and this caused index build to remain stuck for long time. Taking node 172.23.106.238 as example, Projector asked memcached to open streams for collection:  4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000 Projector opened connections with memcached for this connection with UUID: 11539340182236788755 for the below vbuckets 2020-12-03T05:39:27.085-08:00 [Info] FEED[<=>INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5(127.0.0.1:8091)] ##12 vbmap {default,4tkxCz8mOx_3oouJ_qJ-23-348000} - [256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511] Even after 20 minutes, projector did not get mutations for some buckets - even though stream being messages were received. Taking vb:511 as example, stream being for this vb is received at:  2020-12-03T05:39:27.862-08:00 2020 - 12 -03T05: 39 : 27.862 - 08 : 00 [Info] FEED[<=>INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5( 127.0 . 0.1 : 8091 )] ## 12 backch *projector.controlStreamRequest {controlStreamRequest, SUCCESS, 4tkxCz8mOx_3oouJ_qJ- 23 - 348000 :Dpna7P90-LXavtWiBc4k- 23 - 383000 :qZd- 23 - 383000 , 511 , 459162c887c3, 0 , ## 12 , 17356619170941278983 } Even after 20 minutes, projector did not receive the mutations for vb:511 (The high seqno of this vb is "0") 2020 - 12 -03T06: 00 : 08.891 - 08 : 00 [Info] KVDT[<-4tkxCz8mOx_3oouJ_qJ- 23 - 348000 :Dpna7P90-LXavtWiBc4k- 23 - 383000 :qZd- 23 - 383000 <- 127.0 . 0.1 : 8091 #INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5] ## 12 vbseqnos: [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 998 998 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 925 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] Looking at memcached.log for vb:511, the backfill seems to have been scheduled but not completed   2020 - 12 -03T05: 39 : 27.861810 - 08 : 00 INFO 3647 : (4tkxCz8mOx_3oouJ_qJ- 23 - 348000 ) DCP (Producer) eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ- 23 - 348000 :Dpna7P90-LXavtWiBc4k- 23 - 383000 :qZd- 23 - 383000 -INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5- 11539340182236788755 / 3 - (vb: 511 ) Creating stream with start seqno 0 and end seqno 18446744073709551615 ; requested end seqno was 18446744073709551615 , collections-manifest uid: 0 , filter: 1 sid:none 2020 - 12 -03T05: 39 : 27.861823 - 08 : 00 INFO 3647 : (4tkxCz8mOx_3oouJ_qJ- 23 - 348000 ) DCP (Producer) eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ- 23 - 348000 :Dpna7P90-LXavtWiBc4k- 23 - 383000 :qZd- 23 - 383000 -INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5- 11539340182236788755 / 3 - (vb: 511 ) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3" backfill: true , seqno: 999 2020 - 12 -03T05: 39 : 27.861846 - 08 : 00 INFO (4tkxCz8mOx_3oouJ_qJ- 23 - 348000 ) Backfill for eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ- 23 - 348000 :Dpna7P90-LXavtWiBc4k- 23 - 383000 :qZd- 23 - 383000 -INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5- 11539340182236788755 / 3 vb: 511 is pending 2020 - 12 -03T05: 39 : 27.861860 - 08 : 00 INFO 3647 : (4tkxCz8mOx_3oouJ_qJ- 23 - 348000 ) DCP (Producer) eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ- 23 - 348000 :Dpna7P90-LXavtWiBc4k- 23 - 383000 :qZd- 23 - 383000 -INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5- 11539340182236788755 / 3 - (vb: 511 ) Scheduling backfill from 1 to 998 , reschedule flag : False From stats.log, the stats of vb:511 seems to be in back-filling eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_buffer_bytes: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_buffer_items: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_disk_items: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_mem_items: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_backfill_sent: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_cursor_registered: true eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_end_seqno: 18446744073709551615 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_flags: 16 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_items_ready: false eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_read_seqno: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_read_seqno_unsnapshotted: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_sent_seqno: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_last_sent_snap_end_seqno: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_memory_phase: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_opaque: 4278260224 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_readyQ_items: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_ready_queue_memory: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_snap_end_seqno: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_snap_start_seqno: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_start_seqno: 0 eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_state: backfilling eq_dcpq:secidx:proj-4tkxCz8mOx_3oouJ_qJ-23-348000:Dpna7P90-LXavtWiBc4k-23-383000:qZd-23-383000-INIT_STREAM_TOPIC_c1fe2c74d6c39b30b4775a38041f10a5-11539340182236788755/3:stream_511_vb_uuid: 0 From  2020-12-03T05:39:27.877559-08:00 , memcached.log is flooded with below message    2020 - 12 -03T05: 39 : 27.877559 - 08 : 00 INFO (4tkxCz8mOx_3oouJ_qJ- 23 - 348000 ) DCP backfilling task temporarily suspended because the current memory usage is too high Looking at the system memory, there seems to be lot of memory available on the node Assigning it to KV team for further analysis
          Hide
          Balakumaran.Gopal Balakumaran Gopal added a comment -

          Daniel Owen - This is blocking our volume test development that Sumedh Basarkod is working on for scalable stats. Marking this CRITICAL. Can we prioritise this ?

          Show
          Balakumaran.Gopal Balakumaran Gopal added a comment - Daniel Owen - This is blocking our volume test development that Sumedh Basarkod is working on for scalable stats. Marking this CRITICAL. Can we prioritise this ?
          Hide
          owend Daniel Owen added a comment - - edited

          Hi Balakumaran Gopal re: our Slack discussion, given the test is still in development I think the sizing needs tuning.

          Focusing on 172.23.105.175 we can see that 3 out of the 4 buckets memory usage is above the HWM



          I believe you need to increase the memory quota.

          So assigning back to QE to address the test.

          Show
          owend Daniel Owen added a comment - - edited Hi Balakumaran Gopal re: our Slack discussion, given the test is still in development I think the sizing needs tuning. Focusing on 172.23.105.175 we can see that 3 out of the 4 buckets memory usage is above the HWM I believe you need to increase the memory quota. So assigning back to QE to address the test.
          Hide
          sumedh.basarkod Sumedh Basarkod added a comment -

          Working fine after increasing the memory quota. Closing the issue

          Show
          sumedh.basarkod Sumedh Basarkod added a comment - Working fine after increasing the memory quota. Closing the issue

            People

            Assignee:
            sumedh.basarkod Sumedh Basarkod
            Reporter:
            sumedh.basarkod Sumedh Basarkod
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty