Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43008

[System Test] : index building for a partitioned primary index is stuck

    XMLWordPrintable

    Details

      Description

      Build : 6.6.1-9198
      Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
      Scale : 3
      Iteration : 2nd (Day 1)

      This seems to be the same issue as MB-37689 / MB-38232.

      Partitioned primary index on CUSTOMER bucket is stuck at 100%

      [2020-11-29T18:13:55-08:00, sequoiatools/cbq:8e2e15] -e=http://172.23.96.252:8093 -u=Administrator -p=password -script=create primary index on `CUSTOMER` partition by hash(rating,result,claim) using GSI with {"num_partition":4}
      

      This index is on nodes 172.23.104.69, 172.23.96.253 & 172.23.99.11

      Some errors in the indexer logs on 172.23.104.69 around the time of index creation:

      [root@cnt7-qe1 logs]# zgrep "2020-11-29T18:1" indexer.log* | grep CUSTOMER | grep "\[Error"
      indexer.log.2.gz:2020-11-29T18:14:25.162-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.104.5:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.162-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.104.5:9999
      indexer.log.2.gz:2020-11-29T18:14:25.176-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.108.103:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.176-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.108.103:9999
      indexer.log.2.gz:2020-11-29T18:14:25.180-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.96.148:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.181-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.96.148:9999
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.97.119:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.97.119:9999
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.97.121:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err projector.topicMissing
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received projector.topicMissing from 172.23.97.121:9999
      indexer.log.2.gz:2020-11-29T18:14:25.199-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.20:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.199-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.99.20:9999
      indexer.log.2.gz:2020-11-29T18:14:25.204-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.21:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err projector.topicMissing
      indexer.log.2.gz:2020-11-29T18:14:25.204-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received projector.topicMissing from 172.23.99.21:9999
      indexer.log.2.gz:2020-11-29T18:14:25.210-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.25:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.210-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.99.25:9999
      

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            mihir.kamdar Mihir Kamdar created issue -
            mihir.kamdar Mihir Kamdar made changes -
            Field Original Value New Value
            Summary [System Test] : index building for a partitioned primary index is stuck on 1 indexer node [System Test] : index building for a partitioned primary index is stuck
            jeelan.poola Jeelan Poola made changes -
            Assignee Jeelan Poola [ jeelan.poola ] Amit Kulkarni [ amit.kulkarni ]
            Hide
            amit.kulkarni Amit Kulkarni added a comment -

            Checking if this is same as MB-36980 ...

            Show
            amit.kulkarni Amit Kulkarni added a comment - Checking if this is same as MB-36980 ...
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Amit Kulkarni Jeelan Poola any updates on this ticket? Is it okay to move it out of 6.6.1 ?

            Show
            mihir.kamdar Mihir Kamdar added a comment - Amit Kulkarni Jeelan Poola any updates on this ticket? Is it okay to move it out of 6.6.1 ?
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Just spoke to Jeelan. Amit is still analyzing the defect. It looks like a bad issue, and we would want the fix to be included in 6.6.1.

            Show
            mihir.kamdar Mihir Kamdar added a comment - Just spoke to Jeelan. Amit is still analyzing the defect. It looks like a bad issue, and we would want the fix to be included in 6.6.1.
            Hide
            amit.kulkarni Amit Kulkarni added a comment -

            Mihir Kamdar, looking at the following observations, it has a big impact. So, until RCA is done, we cannot move this out. RCA is still in progress.

            The observations:

            1. Indexer is blocked.

            2020-11-30T09:40:50.127-08:00 [Info] DATP[->dataport ":9105"] DATP -> Indexer 89.843679% blocked
            2020-11-30T09:40:50.127-08:00 [Info] DATP[->dataport ":9105"] DATP -> Indexer 99.191578% blocked
            2020-11-30T09:40:50.127-08:00 [Info] DATP[->dataport ":9105"] DATP -> Indexer 89.853399% blocked
            2020-11-30T09:50:54.638-08:00 [Info] DATP[->dataport ":9105"] DATP -> Indexer 89.930350% blocked
            

            2. Index snapshotting is stuck/very slow for about 19 hours for all buckets. And num_docs_pending keeps on increasing. It means that the mutations are getting into the kv store.

            At 2020-11-30T09:51:10:
            "CUSTOMER:o1_result_rating (replica 1):num_snapshots":22014
            "DISTRICT:o2_claims:num_snapshots":2140
            "HISTORY:o3_claims:num_snapshots":9162
            "default:#primary:num_snapshots":568257
             
            At 2020-11-29T15:06:03:
            "CUSTOMER:o1_result_rating (replica 1):num_snapshots":22014
            "DISTRICT:o2_claims:num_snapshots":2140
            "HISTORY:o3_claims:num_snapshots":9162
            "default:#primary:num_snapshots":568257
            

            3. The mutation queue size for default bucket is too high.

            "default:mutation_queue_size":1133792
            

            4. There are some scan timeout errors happening, which can be due to session consistent scans.

            Show
            amit.kulkarni Amit Kulkarni added a comment - Mihir Kamdar , looking at the following observations, it has a big impact. So, until RCA is done, we cannot move this out. RCA is still in progress. The observations: 1. Indexer is blocked. 2020 - 11 -30T09: 40 : 50.127 - 08 : 00 [Info] DATP[->dataport ":9105" ] DATP -> Indexer 89.843679 % blocked 2020 - 11 -30T09: 40 : 50.127 - 08 : 00 [Info] DATP[->dataport ":9105" ] DATP -> Indexer 99.191578 % blocked 2020 - 11 -30T09: 40 : 50.127 - 08 : 00 [Info] DATP[->dataport ":9105" ] DATP -> Indexer 89.853399 % blocked 2020 - 11 -30T09: 50 : 54.638 - 08 : 00 [Info] DATP[->dataport ":9105" ] DATP -> Indexer 89.930350 % blocked 2. Index snapshotting is stuck/very slow for about 19 hours for all buckets. And num_docs_pending keeps on increasing. It means that the mutations are getting into the kv store. At 2020 - 11 -30T09: 51 : 10 : "CUSTOMER:o1_result_rating (replica 1):num_snapshots" : 22014 "DISTRICT:o2_claims:num_snapshots" : 2140 "HISTORY:o3_claims:num_snapshots" : 9162 "default:#primary:num_snapshots" : 568257   At 2020 - 11 -29T15: 06 : 03 : "CUSTOMER:o1_result_rating (replica 1):num_snapshots" : 22014 "DISTRICT:o2_claims:num_snapshots" : 2140 "HISTORY:o3_claims:num_snapshots" : 9162 "default:#primary:num_snapshots" : 568257 3. The mutation queue size for default bucket is too high. "default:mutation_queue_size" : 1133792 4. There are some scan timeout errors happening, which can be due to session consistent scans.
            Hide
            amit.kulkarni Amit Kulkarni added a comment - - edited

            Some more observations:

            Index stuck is on indexer 172.23.96.253.

            1. Looks like flusher is waiting to flush mutations

            3 @ 0x43485a 0x443ef5 0x442c8c 0x4ff836 0x465a11
            #       0x4ff835        github.com/couchbase/indexing/secondary/indexer.(*flusher).flushSingleVbucketUptoSeqno+0x365    goproj/src/github.com/couchbase/indexing/secondary/indexer/flusher.go:308
            

            2. From the logs, it looks like for 3 vbuckets (191, 189, 248), flush is not happening. The same state remains for around 19 hours.

            2020-11-29T14:53:23.331-08:00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 173667 Bucket default Vbucket 191 for 35000 ms. Last Dequeue 0.
            2020-11-29T14:53:23.364-08:00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 186635 Bucket default Vbucket 189 for 35000 ms. Last Dequeue 0.
            2020-11-29T14:53:23.377-08:00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 148443 Bucket default Vbucket 248 for 35000 ms. Last Dequeue 0.
            ...
            2020-11-30T09:51:48.685-08:00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 148443 Bucket default Vbucket 248 for 61105000 ms. Last Dequeue 0.
            2020-11-30T09:51:48.961-08:00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 186635 Bucket default Vbucket 189 for 61105000 ms. Last Dequeue 0.
            2020-11-30T09:51:49.495-08:00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 173667 Bucket default Vbucket 191 for 61105000 ms. Last Dequeue 0.
            

            3. Focussing on bucket 191, it belongs to kv node 172.23.108.103. As per the kvdata logs, for the vbucket 191, kvdata has already seen the seqno 173667 for which indexer is waiting.

            2020-11-29T14:52:36.978-08:00 [Info] KVDT[<-default<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d] ##288 vbseqnos: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 178600 0 161146 174703 0 187001 214810 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 186393 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 148480 148274 175199 200724 147850 173997 187845 186973 187352 0 0 0 0 0 0 173717 173842 186635 160590 173667 149685 169891 181165 180896 181008 194135 149387 149665 149870 150749 193192 191726 169058 193596 157406 168843 192752 182023 149647 149850 168664 150703 172227 193448 192503 168743 169207 157475 149484 189661 180605 180819 204723 180149 149287 148678 157265 168226 168663 179858 180591 181055 149193 168675 148979 149473 180677 254131 168182 167680 203868 168405 168144 192085 148605 154020 148443 148879 179547 179945 179332 192233 167514 0 186877 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 204797 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
            

            4. At 2020-11-29T14:53:23, the flusher was waiting for 35 seconds for the seqno 173667 (i.e. since 2020-11-29T14:52:48), but at that time there isn't any active vbucket stream as per the memcached logs. Following is the last occurrence of vbucket stream for vb 191 for this indexer. Note that, after this time, wee can see other indexers successfully creating vbucket streams for vb 191.

            2020-11-29T14:52:29.704357-08:00 INFO 134: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3 - (vb:191) Creating stream with start seqno 173247 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
            2020-11-29T14:52:29.704407-08:00 INFO 134: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3 - (vb:191) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3" backfill:true, seqno:175142
            2020-11-29T14:52:29.704441-08:00 INFO 134: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3 - (vb:191) Scheduling backfill from 173248 to 175141, reschedule flag : False
            2020-11-29T14:52:30.469143-08:00 INFO 134: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3 - (vb:191) ActiveStream::markDiskSnapshot: Sending disk snapshot with start 173247, end 175791, and high completed --, max visible 175791
            2020-11-29T14:52:30.469295-08:00 INFO 134: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3 - (vb:191) ActiveStream::registerCursor name "eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3", backfill:false, seqno:175792
            2020-11-29T14:52:32.281168-08:00 INFO 134: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3 - (vb:191) Backfill complete, 1079 items read from disk, 434 from memory, last seqno read: 175791, pendingBackfill : False
            2020-11-29T14:52:36.877757-08:00 WARNING 134: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3 - (vb:191) Stream closing, sent until seqno 174633 remaining items 0, reason: The stream closed early because the conn was disconnected
            

            This means that something went wrong on the indexer side.

            5. Indexer tried to open mutation stream at 2020-11-29T14:52:36, but projector on node 172.23.108.103 returned error feed.feeder - because previous request at 2020-11-29T14:51:43 hasn't yet responded.

            2020-11-29T14:52:36.917-08:00 [Error] KVSender::sendMutationTopicRequest Projector 172.23.108.103:9999 Topic MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d default 
            2020-11-29T14:52:36.917-08:00 [Error] KVSender::openMutationStream MAINT_STREAM default Error Received feed.feeder from 172.23.108.103:9999
             
            prev req: 2020-11-29T14:51:43.594-08:00 [Info] KVSender::sendMutationTopicRequest Projector 172.23.108.103:9999 Topic MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d default
            

            6. Meanwhile, previous request receives the rollback. So, stream end issue recieved.

            2020-11-29T14:52:45.741-08:00 [Info] KVSender::openMutationStream MAINT_STREAM default Projector 172.23.99.25:9999 Rollback Received pool:"default" bucket:"default" vbnos:303 vbnos:304 vbnos:305 vbnos:306 vbnos:307 vbnos:308 vbnos:309 vbnos:310 vbnos:311 vbnos:315 vbnos:316 vbnos:623 vbnos:624 vbnos:625 vbnos:626 vbnos:627 vbnos:628 vbnos:629 vbnos:630 vbnos:631 vbnos:695 vbnos:696 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 seqnos:0 vbuuids:133585739715497 vbuuids:15950125255989 vbuuids:23175550451884 vbuuids:52741518790536 vbuuids:225866375618113 vbuuids:49139206490282 vbuuids:39216553385743 vbuuids:30148342475671 vbuuids:9819956268752 vbuuids:46902178522420 vbuuids:111148728589406 vbuuids:26729517547856 vbuuids:131238491713084 vbuuids:223399997626612 vbuuids:53381103950333 vbuuids:153017239541313 vbuuids:85507659899778 vbuuids:194725927067558 vbuuids:171303516790908 vbuuids:116485497368401 vbuuids:23362301620939 vbuuids:141866089159540 snapshots:<start:146776 end:146776 > snapshots:<start:216325 end:216325 > snapshots:<start:178079 end:178079 > snapshots:<start:175404 end:175404 > snapshots:<start:174288 end:174288 > snapshots:<start:147156 end:147156 > snapshots:<start:177598 end:177598 > snapshots:<start:175335 end:175335 > snapshots:<start:205062 end:205062 > snapshots:<start:146450 end:146450 > snapshots:<start:175374 end:175374 > snapshots:<start:185681 end:185681 > snapshots:<start:186475 end:186475 > snapshots:<start:185884 end:185884 > snapshots:<start:146512 end:146512 > snapshots:<start:148348 end:148348 > snapshots:<start:172879 end:172879 > snapshots:<start:146561 end:146561 > snapshots:<start:173788 end:173788 > snapshots:<start:199003 end:199003 
            > snapshots:<start:158756 end:158756 > snapshots:<start:179480 end:179480 >
            

            2020-11-29T14:52:36.933-08:00 [Info] TK StreamEnd MAINT_STREAM default 191 135392966901152 173667
            

            7. The restart TS for vb 191 is less than the seqno for which flusher is waiting ...

            2020-11-29T14:52:45.741-08:00 [Info] KVSender::openMutationStream restartTsList pool:"default" bucket:"default"
            'seqnos:173247'
            '<start:173247 end:173247 > '
            

            8. Indexer starts processing rollback and goes into stream recovery. Recovery code finds ongoing flush and decides to wait until flush finishes. This ensures high watermark timestamp for this bucket is greater than or equal to flush timestamp. Looking at the flush ts below, the seqno 173667 is seen by the stream reader.

            2020-11-29T14:52:45.745-08:00 [Info] Indexer::startBucketStream Rollback from Projector For Stream MAINT_STREAM Bucket default SessionId 3
            2020-11-29T14:52:45.745-08:00 [Info] Indexer::handleInitPrepRecovery StreamId MAINT_STREAM Bucket default State STREAM_PREPARE_RECOVERY SessionId 3
            2020-11-29T14:52:45.745-08:00 [Info] Timekeeper::handleInitPrepRecovery MAINT_STREAM default
            2020-11-29T14:52:45.745-08:00 [Info] Timekeeper::prepareRecovery StreamId MAINT_STREAM Bucket default
            2020-11-29T14:52:45.745-08:00 [Info] Timekeeper::prepareRecovery Stream MAINT_STREAM Bucket default State Changed to PREPARE_RECOVERY
            2020-11-29T14:52:45.745-08:00 [Info] Timekeeper::flushOrAbortInProgressTS Processing Flush TS bucket: default, vbuckets: 1024 Crc64: 2021842410562851897 snapType INMEM_SNAP -
                {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
                {  191     7b23a1b27da0     173667     173247     175791}
            

            9. We can see that the flush was triggered at around 2020-11-29T14:52:36, as per the flushMonitor logs.

            2020-11-29T14:53:46.881-08:00 [Warn] Timekeeper::flushMonitor Waiting for flush to finish for 70 seconds. Stream MAINT_STREAM Bucket default.
            

            10. At 2020-11-29T14:52:36 has also received flush abort request. Where we can see that the seqnos for Flush TS are less the seqnos for which flusher is waiting.

            2020-11-29T14:52:36.856-08:00 [Info] Timekeeper::processFlushAbort Flush Abort Received MAINT_STREAM default
            FlushTs bucket: default, vbuckets: 1024 Crc64: 9616317055526481323 snapType INMEM_SNAP -
                {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
                {  189     129803b17ebc     186635     186359     188009
                {  191     7b23a1b27da0     173667     173247     175791}
                {  248     b6b637351efe     148443     148009     149970}
             
            LastFlushTs bucket: default, vbuckets: 1024 Crc64: 9616317055526481323 snapType INMEM_SNAP -
                {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
                {  189     129803b17ebc     186630     186359     188009}
                {  191     7b23a1b27da0     173547     173247     175791}
                {  248     b6b637351efe     148389     148009     149970}
             
            2020-11-29T14:52:36.859-08:00 [Info] Timekeeper::processFlushAbort MAINT_STREAM default 3 Generate InitPrepRecovery
            2020-11-29T14:52:36.859-08:00 [Info] Indexer::handleInitPrepRecovery StreamId MAINT_STREAM Bucket default SessionId 3. Cannot initiate another recovery while previous recovery in progress.
            

            Show
            amit.kulkarni Amit Kulkarni added a comment - - edited Some more observations: Index stuck is on indexer 172.23.96.253. 1. Looks like flusher is waiting to flush mutations 3 @ 0x43485a 0x443ef5 0x442c8c 0x4ff836 0x465a11 # 0x4ff835 github.com/couchbase/indexing/secondary/indexer.(*flusher).flushSingleVbucketUptoSeqno+ 0x365 goproj/src/github.com/couchbase/indexing/secondary/indexer/flusher.go: 308 2. From the logs, it looks like for 3 vbuckets (191, 189, 248), flush is not happening. The same state remains for around 19 hours. 2020 - 11 -29T14: 53 : 23.331 - 08 : 00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 173667 Bucket default Vbucket 191 for 35000 ms. Last Dequeue 0 . 2020 - 11 -29T14: 53 : 23.364 - 08 : 00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 186635 Bucket default Vbucket 189 for 35000 ms. Last Dequeue 0 . 2020 - 11 -29T14: 53 : 23.377 - 08 : 00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 148443 Bucket default Vbucket 248 for 35000 ms. Last Dequeue 0 . ... 2020 - 11 -30T09: 51 : 48.685 - 08 : 00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 148443 Bucket default Vbucket 248 for 61105000 ms. Last Dequeue 0 . 2020 - 11 -30T09: 51 : 48.961 - 08 : 00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 186635 Bucket default Vbucket 189 for 61105000 ms. Last Dequeue 0 . 2020 - 11 -30T09: 51 : 49.495 - 08 : 00 [Warn] Indexer::MutationQueue Dequeue Waiting For Seqno 173667 Bucket default Vbucket 191 for 61105000 ms. Last Dequeue 0 . 3. Focussing on bucket 191, it belongs to kv node 172.23.108.103. As per the kvdata logs, for the vbucket 191, kvdata has already seen the seqno 173667 for which indexer is waiting. 2020 - 11 -29T14: 52 : 36.978 - 08 : 00 [Info] KVDT[<- default <- 127.0 . 0.1 : 8091 #MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d] ## 288 vbseqnos: [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 178600 0 161146 174703 0 187001 214810 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 186393 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 148480 148274 175199 200724 147850 173997 187845 186973 187352 0 0 0 0 0 0 173717 173842 186635 160590 173667 149685 169891 181165 180896 181008 194135 149387 149665 149870 150749 193192 191726 169058 193596 157406 168843 192752 182023 149647 149850 168664 150703 172227 193448 192503 168743 169207 157475 149484 189661 180605 180819 204723 180149 149287 148678 157265 168226 168663 179858 180591 181055 149193 168675 148979 149473 180677 254131 168182 167680 203868 168405 168144 192085 148605 154020 148443 148879 179547 179945 179332 192233 167514 0 186877 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 204797 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] 4. At 2020-11-29T14:53:23, the flusher was waiting for 35 seconds for the seqno 173667 (i.e. since 2020-11-29T14:52:48), but at that time there isn't any active vbucket stream as per the memcached logs. Following is the last occurrence of vbucket stream for vb 191 for this indexer. Note that, after this time, wee can see other indexers successfully creating vbucket streams for vb 191. 2020 - 11 -29T14: 52 : 29.704357 - 08 : 00 INFO 134 : ( default ) DCP (Producer) eq_dcpq:secidx:proj- default -MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930 / 3 - (vb: 191 ) Creating stream with start seqno 173247 and end seqno 18446744073709551615 ; requested end seqno was 18446744073709551615 , collections-manifest uid:none, sid:none 2020 - 11 -29T14: 52 : 29.704407 - 08 : 00 INFO 134 : ( default ) DCP (Producer) eq_dcpq:secidx:proj- default -MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930 / 3 - (vb: 191 ) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3" backfill: true , seqno: 175142 2020 - 11 -29T14: 52 : 29.704441 - 08 : 00 INFO 134 : ( default ) DCP (Producer) eq_dcpq:secidx:proj- default -MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930 / 3 - (vb: 191 ) Scheduling backfill from 173248 to 175141 , reschedule flag : False 2020 - 11 -29T14: 52 : 30.469143 - 08 : 00 INFO 134 : ( default ) DCP (Producer) eq_dcpq:secidx:proj- default -MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930 / 3 - (vb: 191 ) ActiveStream::markDiskSnapshot: Sending disk snapshot with start 173247 , end 175791 , and high completed --, max visible 175791 2020 - 11 -29T14: 52 : 30.469295 - 08 : 00 INFO 134 : ( default ) DCP (Producer) eq_dcpq:secidx:proj- default -MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930 / 3 - (vb: 191 ) ActiveStream::registerCursor name "eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930/3" , backfill: false , seqno: 175792 2020 - 11 -29T14: 52 : 32.281168 - 08 : 00 INFO 134 : ( default ) DCP (Producer) eq_dcpq:secidx:proj- default -MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930 / 3 - (vb: 191 ) Backfill complete, 1079 items read from disk, 434 from memory, last seqno read: 175791 , pendingBackfill : False 2020 - 11 -29T14: 52 : 36.877757 - 08 : 00 WARNING 134 : ( default ) DCP (Producer) eq_dcpq:secidx:proj- default -MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930 / 3 - (vb: 191 ) Stream closing, sent until seqno 174633 remaining items 0 , reason: The stream closed early because the conn was disconnected This means that something went wrong on the indexer side. 5. Indexer tried to open mutation stream at 2020-11-29T14:52:36, but projector on node 172.23.108.103 returned error feed.feeder - because previous request at 2020-11-29T14:51:43 hasn't yet responded. 2020 - 11 -29T14: 52 : 36.917 - 08 : 00 [Error] KVSender::sendMutationTopicRequest Projector 172.23 . 108.103 : 9999 Topic MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d default 2020 - 11 -29T14: 52 : 36.917 - 08 : 00 [Error] KVSender::openMutationStream MAINT_STREAM default Error Received feed.feeder from 172.23 . 108.103 : 9999   prev req: 2020 - 11 -29T14: 51 : 43.594 - 08 : 00 [Info] KVSender::sendMutationTopicRequest Projector 172.23 . 108.103 : 9999 Topic MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d default 6. Meanwhile, previous request receives the rollback. So, stream end issue recieved. 2020 - 11 -29T14: 52 : 45.741 - 08 : 00 [Info] KVSender::openMutationStream MAINT_STREAM default Projector 172.23 . 99.25 : 9999 Rollback Received pool: "default" bucket: "default" vbnos: 303 vbnos: 304 vbnos: 305 vbnos: 306 vbnos: 307 vbnos: 308 vbnos: 309 vbnos: 310 vbnos: 311 vbnos: 315 vbnos: 316 vbnos: 623 vbnos: 624 vbnos: 625 vbnos: 626 vbnos: 627 vbnos: 628 vbnos: 629 vbnos: 630 vbnos: 631 vbnos: 695 vbnos: 696 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 seqnos: 0 vbuuids: 133585739715497 vbuuids: 15950125255989 vbuuids: 23175550451884 vbuuids: 52741518790536 vbuuids: 225866375618113 vbuuids: 49139206490282 vbuuids: 39216553385743 vbuuids: 30148342475671 vbuuids: 9819956268752 vbuuids: 46902178522420 vbuuids: 111148728589406 vbuuids: 26729517547856 vbuuids: 131238491713084 vbuuids: 223399997626612 vbuuids: 53381103950333 vbuuids: 153017239541313 vbuuids: 85507659899778 vbuuids: 194725927067558 vbuuids: 171303516790908 vbuuids: 116485497368401 vbuuids: 23362301620939 vbuuids: 141866089159540 snapshots:<start: 146776 end: 146776 > snapshots:<start: 216325 end: 216325 > snapshots:<start: 178079 end: 178079 > snapshots:<start: 175404 end: 175404 > snapshots:<start: 174288 end: 174288 > snapshots:<start: 147156 end: 147156 > snapshots:<start: 177598 end: 177598 > snapshots:<start: 175335 end: 175335 > snapshots:<start: 205062 end: 205062 > snapshots:<start: 146450 end: 146450 > snapshots:<start: 175374 end: 175374 > snapshots:<start: 185681 end: 185681 > snapshots:<start: 186475 end: 186475 > snapshots:<start: 185884 end: 185884 > snapshots:<start: 146512 end: 146512 > snapshots:<start: 148348 end: 148348 > snapshots:<start: 172879 end: 172879 > snapshots:<start: 146561 end: 146561 > snapshots:<start: 173788 end: 173788 > snapshots:<start: 199003 end: 199003 > snapshots:<start: 158756 end: 158756 > snapshots:<start: 179480 end: 179480 > 2020 - 11 -29T14: 52 : 36.933 - 08 : 00 [Info] TK StreamEnd MAINT_STREAM default 191 135392966901152 173667 7. The restart TS for vb 191 is less than the seqno for which flusher is waiting ... 2020 - 11 -29T14: 52 : 45.741 - 08 : 00 [Info] KVSender::openMutationStream restartTsList pool: "default" bucket: "default" 'seqnos:173247' '<start:173247 end:173247 > ' 8. Indexer starts processing rollback and goes into stream recovery. Recovery code finds ongoing flush and decides to wait until flush finishes. This ensures high watermark timestamp for this bucket is greater than or equal to flush timestamp. Looking at the flush ts below, the seqno 173667 is seen by the stream reader. 2020 - 11 -29T14: 52 : 45.745 - 08 : 00 [Info] Indexer::startBucketStream Rollback from Projector For Stream MAINT_STREAM Bucket default SessionId 3 2020 - 11 -29T14: 52 : 45.745 - 08 : 00 [Info] Indexer::handleInitPrepRecovery StreamId MAINT_STREAM Bucket default State STREAM_PREPARE_RECOVERY SessionId 3 2020 - 11 -29T14: 52 : 45.745 - 08 : 00 [Info] Timekeeper::handleInitPrepRecovery MAINT_STREAM default 2020 - 11 -29T14: 52 : 45.745 - 08 : 00 [Info] Timekeeper::prepareRecovery StreamId MAINT_STREAM Bucket default 2020 - 11 -29T14: 52 : 45.745 - 08 : 00 [Info] Timekeeper::prepareRecovery Stream MAINT_STREAM Bucket default State Changed to PREPARE_RECOVERY 2020 - 11 -29T14: 52 : 45.745 - 08 : 00 [Info] Timekeeper::flushOrAbortInProgressTS Processing Flush TS bucket: default , vbuckets: 1024 Crc64: 2021842410562851897 snapType INMEM_SNAP - {vbno, vbuuid, seqno, snapshot-start, snapshot-end} { 191 7b23a1b27da0 173667 173247 175791 } 9. We can see that the flush was triggered at around 2020-11-29T14:52:36, as per the flushMonitor logs. 2020 - 11 -29T14: 53 : 46.881 - 08 : 00 [Warn] Timekeeper::flushMonitor Waiting for flush to finish for 70 seconds. Stream MAINT_STREAM Bucket default . 10. At 2020-11-29T14:52:36 has also received flush abort request. Where we can see that the seqnos for Flush TS are less the seqnos for which flusher is waiting. 2020 - 11 -29T14: 52 : 36.856 - 08 : 00 [Info] Timekeeper::processFlushAbort Flush Abort Received MAINT_STREAM default FlushTs bucket: default , vbuckets: 1024 Crc64: 9616317055526481323 snapType INMEM_SNAP - {vbno, vbuuid, seqno, snapshot-start, snapshot-end} { 189 129803b17ebc 186635 186359 188009 { 191 7b23a1b27da0 173667 173247 175791 } { 248 b6b637351efe 148443 148009 149970 }   LastFlushTs bucket: default , vbuckets: 1024 Crc64: 9616317055526481323 snapType INMEM_SNAP - {vbno, vbuuid, seqno, snapshot-start, snapshot-end} { 189 129803b17ebc 186630 186359 188009 } { 191 7b23a1b27da0 173547 173247 175791 } { 248 b6b637351efe 148389 148009 149970 }   2020 - 11 -29T14: 52 : 36.859 - 08 : 00 [Info] Timekeeper::processFlushAbort MAINT_STREAM default 3 Generate InitPrepRecovery 2020 - 11 -29T14: 52 : 36.859 - 08 : 00 [Info] Indexer::handleInitPrepRecovery StreamId MAINT_STREAM Bucket default SessionId 3 . Cannot initiate another recovery while previous recovery in progress.
            Hide
            amit.kulkarni Amit Kulkarni added a comment - - edited

            Looking at the description of MB-36980, it looks like the behaviour here matches the description in MB-36980.

            Show
            amit.kulkarni Amit Kulkarni added a comment - - edited Looking at the description of MB-36980 , it looks like the behaviour here matches the description in MB-36980 .
            jeelan.poola Jeelan Poola made changes -
            Labels system-test hard_bug system-test
            Hide
            amit.kulkarni Amit Kulkarni added a comment - - edited

            On a possibly unrelated note, indexer and projector were seeing following error for about 8 mins. During stream recovery time frame starting around 2020-11-29T14:43.

            Invalid KV node no -1 for vb 303
            

            Show
            amit.kulkarni Amit Kulkarni added a comment - - edited On a possibly unrelated note, indexer and projector were seeing following error for about 8 mins. During stream recovery time frame starting around 2020-11-29T14:43. Invalid KV node no - 1 for vb 303
            Hide
            deepkaran.salooja Deepkaran Salooja added a comment -

            On node 172.23.96.253, vbucket 214(bucket default) is the problem.

            1. As part of recovery, stream gets open. StreamBegin is received.

            2020-11-29T14:52:29.801-08:00 [Info] TK StreamBegin MAINT_STREAM default 214 47148652259920 170669 3
            

            2. Shortly, there is an abort in flusher.

            2020-11-29T14:52:36.844-08:00 [Warn] Indexer::MutationQueue Dequeue Aborted For Seqno 172214 Bucket default Vbucket 214. Last Dequeue 0 Head Seqno 172223.
            

            3. Indexer bookkeeping is complaining that the last snapshot received for the vbucket 214 was Snapshot 172165-172214. However, the last Seqno received was 172227.

            2020-11-29T14:52:36.850-08:00 [Warn] StreamState::alignSnapBoundary.  TS falls out of snapshot boundary. Bucket default StreamId MAINT_STREAM vbucket 214 Snapshot 172165-172214 Seqno 172227 Vbuuid 47148652259920 lastSnap 171660-172163
            2020-11-29T14:52:36.855-08:00 [Warn] StreamState::alignSnapBoundary.  TS falls out of snapshot boundary. Bucket default StreamId MAINT_STREAM vbucket 214 Snapshot 172165-172214 Seqno 172227 Vbuuid 47148652259920 lastSnap 171660-172163
            

            It is unexpected for a seqno to arrive outside the snapshot boundary. This leads to abort and indexer runs into MB-36980.

            4. Looking at the projector logs on 172.23.108.103 which is hosting vbucket 214, the following errors are reported at the same time.

            2020-11-29T14:52:36.731-08:00 [Error] DCP[secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930] ##288 len(feed.nodeFeeds[master]) is "0". Master node for vb:622 is 172.23.106.100:11210
            2020-11-29T14:52:36.731-08:00 [Error] FEED[<=>MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d(127.0.0.1:8091)] ##288 StartVbStreams("default"): dcp.invalidVbucket
            

            The above errors lead to cleanup of the running feed. Need to check if something could go wrong due to this.

            5. memcached logs do not show anything unexpected

            2020-11-29T14:52:29.768263-08:00 INFO 382: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-128
            86222343688388930/2 - (vb:214) Creating stream with start seqno 170669 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615
            , collections-manifest uid:none, sid:none
             
            2020-11-29T14:52:36.896859-08:00 WARNING 382: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-
            12886222343688388930/2 - (vb:214) Stream closing, sent until seqno 172562 remaining items 315, reason: The stream closed early because the conn was dis
            connected
            

            Show
            deepkaran.salooja Deepkaran Salooja added a comment - On node 172.23.96.253 , vbucket 214(bucket default) is the problem. 1. As part of recovery, stream gets open. StreamBegin is received. 2020-11-29T14:52:29.801-08:00 [Info] TK StreamBegin MAINT_STREAM default 214 47148652259920 170669 3 2. Shortly, there is an abort in flusher. 2020-11-29T14:52:36.844-08:00 [Warn] Indexer::MutationQueue Dequeue Aborted For Seqno 172214 Bucket default Vbucket 214. Last Dequeue 0 Head Seqno 172223. 3. Indexer bookkeeping is complaining that the last snapshot received for the vbucket 214 was Snapshot 172165-172214. However, the last Seqno received was 172227. 2020-11-29T14:52:36.850-08:00 [Warn] StreamState::alignSnapBoundary. TS falls out of snapshot boundary. Bucket default StreamId MAINT_STREAM vbucket 214 Snapshot 172165-172214 Seqno 172227 Vbuuid 47148652259920 lastSnap 171660-172163 2020-11-29T14:52:36.855-08:00 [Warn] StreamState::alignSnapBoundary. TS falls out of snapshot boundary. Bucket default StreamId MAINT_STREAM vbucket 214 Snapshot 172165-172214 Seqno 172227 Vbuuid 47148652259920 lastSnap 171660-172163 It is unexpected for a seqno to arrive outside the snapshot boundary. This leads to abort and indexer runs into MB-36980 . 4. Looking at the projector logs on 172.23.108.103  which is hosting vbucket 214, the following errors are reported at the same time. 2020-11-29T14:52:36.731-08:00 [Error] DCP[secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-12886222343688388930] ##288 len(feed.nodeFeeds[master]) is "0". Master node for vb:622 is 172.23.106.100:11210 2020-11-29T14:52:36.731-08:00 [Error] FEED[<=>MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d(127.0.0.1:8091)] ##288 StartVbStreams("default"): dcp.invalidVbucket The above errors lead to cleanup of the running feed. Need to check if something could go wrong due to this. 5. memcached logs do not show anything unexpected 2020-11-29T14:52:29.768263-08:00 INFO 382: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d-128 86222343688388930/2 - (vb:214) Creating stream with start seqno 170669 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615 , collections-manifest uid:none, sid:none   2020-11-29T14:52:36.896859-08:00 WARNING 382: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_b5498457cf95d0d23003751bba32455d- 12886222343688388930/2 - (vb:214) Stream closing, sent until seqno 172562 remaining items 315, reason: The stream closed early because the conn was dis connected
            Hide
            deepkaran.salooja Deepkaran Salooja added a comment -

            This behavior doesn't look related to any changes done in 6.6.1. Though at this point, the root cause of a mutation being recorded in indexer bookkeeping, without the corresponding snapshot marker, is not clear. 

            Show
            deepkaran.salooja Deepkaran Salooja added a comment - This behavior doesn't look related to any changes done in 6.6.1. Though at this point, the root cause of a mutation being recorded in indexer bookkeeping, without the corresponding snapshot marker, is not clear. 
            lynn.straus Lynn Straus made changes -
            Link This issue blocks MB-40528 [ MB-40528 ]
            lynn.straus Lynn Straus made changes -
            Labels hard_bug system-test approved-for-6.6.1 hard_bug system-test
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9213 contains indexing commit bc72f9e with commit message:
            MB-43008 [BP 6.6.1] force recovery on flush abort

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9213 contains indexing commit bc72f9e with commit message: MB-43008 [BP 6.6.1] force recovery on flush abort
            deepkaran.salooja Deepkaran Salooja made changes -
            Link This issue is cloned by MB-43259 [ MB-43259 ]
            Hide
            deepkaran.salooja Deepkaran Salooja added a comment -

            Defensive fix has been merged. Opened MB-43259 to track the underlying issue.

            Show
            deepkaran.salooja Deepkaran Salooja added a comment - Defensive fix has been merged. Opened  MB-43259 to track the underlying issue.
            deepkaran.salooja Deepkaran Salooja made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Link This issue relates to MB-36980 [ MB-36980 ]
            deepkaran.salooja Deepkaran Salooja made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            wayne Wayne Siu made changes -
            Link This issue relates to MB-43259 [ MB-43259 ]
            wayne Wayne Siu made changes -
            Assignee Amit Kulkarni [ amit.kulkarni ] Mihir Kamdar [ mihir.kamdar ]
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Latest logs from 6.6.1-9213 :

            Centos1 cluster (172.23.108.103)
            Index Nodes : 172.23.104.137, 172.23.104.155, 172.23.104.70, 172.23.96.253, 172.23.96.95, 172.23.99.11
            Logs:
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.137.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.155.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.157.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.5.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.67.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.69.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.70.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.106.188.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.108.103.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.120.245.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.121.3.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.123.27.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.123.28.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.148.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.251.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.252.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.253.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.95.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.119.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.121.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.122.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.239.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.98.135.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.11.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.20.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.21.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.25.zip

            Centos2 Cluster (172.23.97.74)
            Index Nodes : 172.23.106.134, 172.23.120.58, 172.23.120.81, 172.23.96.243, 172.23.97.105, 172.23.97.110
            Logs :
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.106.134.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.58.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.74.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.75.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.77.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.81.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.86.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.121.77.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.25.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.26.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.31.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.33.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.122.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.14.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.18.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.243.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.254.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.48.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.105.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.110.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.112.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.148.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.149.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.150.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.151.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.74.zip

            Show
            mihir.kamdar Mihir Kamdar added a comment - Latest logs from 6.6.1-9213 : Centos1 cluster (172.23.108.103) Index Nodes : 172.23.104.137, 172.23.104.155, 172.23.104.70, 172.23.96.253, 172.23.96.95, 172.23.99.11 Logs: url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.137.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.155.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.157.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.5.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.67.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.69.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.104.70.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.106.188.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.108.103.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.120.245.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.121.3.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.123.27.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.123.28.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.148.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.251.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.252.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.253.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.96.95.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.119.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.121.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.122.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.97.239.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.98.135.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.11.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.20.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.21.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607809734/collectinfo-2020-12-12T214857-ns_1%40172.23.99.25.zip Centos2 Cluster (172.23.97.74) Index Nodes : 172.23.106.134, 172.23.120.58, 172.23.120.81, 172.23.96.243, 172.23.97.105, 172.23.97.110 Logs : url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.106.134.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.58.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.74.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.75.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.77.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.81.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.120.86.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.121.77.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.25.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.26.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.31.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.123.33.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.122.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.14.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.18.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.243.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.254.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.96.48.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.105.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.110.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.112.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.148.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.149.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.150.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.151.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607805003/collectinfo-2020-12-12T203004-ns_1%40172.23.97.74.zip
            Hide
            deepkaran.salooja Deepkaran Salooja added a comment - - edited

            Centos1 cluster hit into the issue on 172.23.99.11:
             

            2020-12-12T12:06:41.850-08:00 [Warn] Indexer::MutationQueue Dequeue Aborted For Seqno 252528 Bucket default Vbucket 536. Last Dequeue 252493 Head Seqno 252597.2020-12-12T12:06:41.854-08:00 [Info] Timekeeper::processFlushAbort Flush Abort Received MAINT_STREAM default
            FlushTs bucket: default, vbuckets: 1024 Crc64: 5001023028077095860 snapType INMEM_SNAP -2020-12-12T12:06:41.857-08:00 [Info] Timekeeper::processFlushAbort MAINT_STREAM default 3 Generate InitPrepRecovery 
            

            With the fix, system was able to recover.

            Show
            deepkaran.salooja Deepkaran Salooja added a comment - - edited Centos1 cluster hit into the issue on 172.23.99.11 :   2020-12-12T12:06:41.850-08:00 [Warn] Indexer::MutationQueue Dequeue Aborted For Seqno 252528 Bucket default Vbucket 536. Last Dequeue 252493 Head Seqno 252597.2020-12-12T12:06:41.854-08:00 [Info] Timekeeper::processFlushAbort Flush Abort Received MAINT_STREAM default FlushTs bucket: default, vbuckets: 1024 Crc64: 5001023028077095860 snapType INMEM_SNAP -2020-12-12T12:06:41.857-08:00 [Info] Timekeeper::processFlushAbort MAINT_STREAM default 3 Generate InitPrepRecovery With the fix, system was able to recover.
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Closing as per Deep's comment w.r.t. the defensive fix, on 6.6.1-9213

            Show
            mihir.kamdar Mihir Kamdar added a comment - Closing as per Deep's comment w.r.t. the defensive fix, on 6.6.1-9213
            mihir.kamdar Mihir Kamdar made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-4156 contains indexing commit b90f5dd with commit message:
            MB-43008: Introduce logging for sequence number violation in projector

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4156 contains indexing commit b90f5dd with commit message: MB-43008 : Introduce logging for sequence number violation in projector

              People

              Assignee:
              mihir.kamdar Mihir Kamdar
              Reporter:
              mihir.kamdar Mihir Kamdar
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                    PagerDuty