Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53115

Index rollbackAllToZero messages seen

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Steps to reproduce:

      1. Create 4 buckets 
      2. Create indexes with replicas on each of the 4 buckets.
      3. Run pillowfight to continuously load data ((buckets have 1M, 1M , 1M and 3M items). The bucket RR needs to be under 10%. Load until then
      4. Run a shell script that runs the request_plus scans continuously.
      5. Run stress-ng  with the params -> stress-ng --vm 4 --vm-bytes 1G --metrics-brief --vm-keep --vm-locked -m 4 --aggressive --vm-populate.  (Adjust the --vm-bytes param depending upon the VM resources)
      6. Once you run enough stress-ng processes, OOM kill will kick in. This can be verified by checking the dmesg ( dmesg -T | egrep -i 'killed process' )
      7. There's a possibility that stress-ng gets spawned and killed since OOM kill is determined by a oom_score_adj factor. In order to make sure that memcached gets killed run this 

      echo 1000 > /proc/<memcached PID>/oom_score_adj 

      1. Observe that the scans are timing out and that the index has rolled back to 0.

       

      Logs and pcaps are attached. There were 2 instances of rollbacks observed. Please use these timestamps for analysis

      Instance 1

       

      Index node1 ( 172.23.106.159) 
      2022-07-26T03:26:54.738-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8
       
      Index node2 ( 172.23.106.163) 
       
      2022-07-26T03:26:58.186-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8

      Instance 2

       

       

      Index node1 ( 172.23.106.159) 
       
      2022-07-26T05:06:12.658-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8
       
       Index node2 ( 172.23.106.163) 
       
      2022-07-26T05:06:10.805-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8

       

       

       

      Log bundles -> 

      s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.105.36.zip
      s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.105.37.zip
      s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.156.zip
      s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.159.zip
      s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.163.zip
      s3://cb-customers-secure/cbse12279oomkil2/2022-07-26/collectinfo-2022-07-26t125548-ns_1@172.23.106.204.zip

       

       

      I have the packet capture files too and can attach them if necessary. 

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          drigby Dave Rigby added a comment - - edited

          OOM-killer seen killing memcached on node 105.36 at:

          couchbase.log:[Tue Jul 26 05:02:55 2022] Killed process 18925 (memcached), UID 996, total-vm:16346528kB, anon-rss:7020668kB, file-rss:0kB, shmem-rss:0kB
          couchbase.log:[Tue Jul 26 05:02:55 2022] Killed process 22172 (mc:writer_1), UID 996, total-vm:16346528kB, anon-rss:7021208kB, file-rss:1040kB, shmem-rss:0kB
          

          There's 280 rollback responses sent to GSI from memcached on node 105.36 after this OOM-kill:

           rg 'Stream request requires rollback' memcached.log | rg 2022-07-26T05: | rg secidx | wc -l
               280
          

          I'm not sure how to map the rollbacks seem from KV to indexing's "rollback to zero" - certainly KV didn't respond with "rollback to zero" for any of the failed stream requests post-OOM-kill:

          rg 'Stream request requires rollback' memcached.log | rg 2022-07-26T05: | rg secidx | rg -c --include-zero seqno:0
          0
          

          Few examples of the GSI rollbacks we responded with:

          2022-07-26T05:03:14.726864-07:00 WARNING 145: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-8649600389967487890/3 - (vb:3) Stream request requires rollback to seqno:121417 because consumer ahead of producer - producer upper at 121417. Client requested seqnos:{127824,18446744073709551615} snapshot:{127814,127824} uuid:265136827834023
          2022-07-26T05:03:16.222055-07:00 WARNING 150: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-18233705252004069608/2 - (vb:2) Stream request requires rollback to seqno:125392 because consumer ahead of producer - producer upper at 125392. Client requested seqnos:{126981,18446744073709551615} snapshot:{126981,126981} uuid:35916178612177
          

          drigby Dave Rigby added a comment - - edited OOM-killer seen killing memcached on node 105.36 at: couchbase.log:[Tue Jul 26 05 : 02 : 55 2022 ] Killed process 18925 (memcached), UID 996 , total-vm:16346528kB, anon-rss:7020668kB, file-rss:0kB, shmem-rss:0kB couchbase.log:[Tue Jul 26 05 : 02 : 55 2022 ] Killed process 22172 (mc:writer_1), UID 996 , total-vm:16346528kB, anon-rss:7021208kB, file-rss:1040kB, shmem-rss:0kB There's 280 rollback responses sent to GSI from memcached on node 105.36 after this OOM-kill: rg 'Stream request requires rollback' memcached.log | rg 2022-07-26T05: | rg secidx | wc -l 280 I'm not sure how to map the rollbacks seem from KV to indexing's "rollback to zero" - certainly KV didn't respond with "rollback to zero" for any of the failed stream requests post-OOM-kill: rg 'Stream request requires rollback' memcached.log | rg 2022-07-26T05: | rg secidx | rg -c --include-zero seqno:0 0 Few examples of the GSI rollbacks we responded with: 2022-07-26T05:03:14.726864-07:00 WARNING 145: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-8649600389967487890/3 - (vb:3) Stream request requires rollback to seqno:121417 because consumer ahead of producer - producer upper at 121417. Client requested seqnos:{127824,18446744073709551615} snapshot:{127814,127824} uuid:265136827834023 2022-07-26T05:03:16.222055-07:00 WARNING 150: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-18233705252004069608/2 - (vb:2) Stream request requires rollback to seqno:125392 because consumer ahead of producer - producer upper at 125392. Client requested seqnos:{126981,18446744073709551615} snapshot:{126981,126981} uuid:35916178612177
          drigby Dave Rigby added a comment -

          Pavan PB Please could you upload the packet captures also.

          drigby Dave Rigby added a comment - Pavan PB Please could you upload the packet captures also.

          a. Consider the rollback

          2022-07-26T05:06:10.805-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8
          

          b. Indexer did not have any snapshot to be used. Hence, it rollbacked to zero

          2022-07-26T05:06:09.431-07:00 [Info] StorageMgr::handleRollback latestSnapInfo <nil> lastRollbackTs <nil>. Use latest snapshot.
          

          c. The rollbackTs for many vbuckets is "0"

          2022-07-26T05:06:09.430-07:00 [Info] StorageMgr::handleRollback rollbackTs is bucket: test8, vbuckets: 1024 Crc64: 0 snapType NO_SNAP -
              {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
              {    0     1a84ffdb1dbd          0          0          0}
              {    1     4fdea47cd9ba          0          0          0}
              {    2     20aa63115fd1     126103          0          0}
              {    3     f123fa709ea7     127215          0          0}
          

          d. Taking vb: 0 as example, memcached oom kill happened and that lead to memcached create a new vbuuid due to unclean shutdown

          2022-07-26T05:02:58.678-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Control connection to memcached on 'ns_1@172.23.105.36' disconnected. Check logs for details.
          2022-07-26T05:03:00.548-07:00, ns_log:0:info:message(ns_1@172.23.105.36) - Service 'memcached' exited with status 137. Restarting. Messages:
          ...
           
          2022-07-26T05:03:08.066510-07:00 INFO (test8) Warmup::createVBuckets: vb:0 created new failover entry with uuid:29158530555325 and seqno:116360 due to unclean shutdown
          

          e. Indexer received mutations with this vbuuid from memcached on 100.36 post restart, till the connection is closed

          2022-07-26T05:03:16.199803-07:00 INFO 148: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-18233705252004069608/0 - (vb:0) Creating stream with start seqno 98199 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
          ...
          2022-07-26T05:05:05.432874-07:00 WARNING 148: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-18233705252004069608/0 - (vb:0) Stream closing, sent until seqno 109400 remaining items 0, reason: The stream closed early because the conn was disconnected
          

          f. The connnection is closed at the time of failover

          2022-07-26T05:05:02.810-07:00, failover:0:info:message(ns_1@172.23.105.36) - Starting failing over ['ns_1@172.23.105.36']
          2022-07-26T05:05:02.810-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.36) - Starting failover of nodes ['ns_1@172.23.105.36']. Operation Id = 2139917b8f5387cec83fac633b7e7b5a
          

          g. Due to failover, the vbuuid changed

          2022-07-26T05:05:04.916622-07:00 INFO (test8) VBucket::setState: transitioning vb:0 with high seqno:126312 from:replica to:active meta:{"topology":[["ns_1@172.23.105.37",null]]}
          2022-07-26T05:05:04.916676-07:00 INFO (test8) KVBucket::setVBucketState: vb:0 created new failover entry with uuid:122286947444349 and seqno:125619
          

          g. When KV node fail over happened, this has led indexer to restart the stream with 1a84ffdb1dbd as node UUID and 109400 as the last received seqno.

          2022-07-26T05:05:56.934-07:00 [Info] FEED[<=>MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf(127.0.0.1:8091)] ##98 start-timestamp bucket: test8, vbuckets: 171 -
              {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
              {    0     1a84ffdb1dbd     109400      54044     116360}
              {    1     4fdea47cd9ba      92432      53304     113709}
          

          g. When indexer made the request, failover table did not contain the 29158530555325 (or 1a84ffdb1dbd) UUID. Hence, memcached threw this error

          Failover table:

          test8
           
           vb_0:0:id:                            122286947444349
           vb_0:0:seq:                           125619
           vb_0:1:id:                            216042336623651
           vb_0:1:seq:                           0
          

          2022-07-26T05:05:56.981326-07:00 WARNING 61: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-1290788532655393424/1 - (vb:0) Stream request requires rollback to seqno:0 because vBucket UUID not found in failover table, consumer and producer have no common history. Client requested seqnos:{109400,18446744073709551615} snapshot:{54044,116360} uuid:29158530555325
          

          This has resulted in rollback to zero. The UUID created by the failed over node (100.36) is not persisted in the failover table. When failover happened, memcached did not have the UUID indexer wanted and hence there was a rollback to zero. I checked some other vbuckets as well. The issue appears to be different from customer issue.

          varun.velamuri Varun Velamuri added a comment - a. Consider the rollback 2022-07-26T05:06:10.805-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8 b. Indexer did not have any snapshot to be used. Hence, it rollbacked to zero 2022-07-26T05:06:09.431-07:00 [Info] StorageMgr::handleRollback latestSnapInfo <nil> lastRollbackTs <nil>. Use latest snapshot. c. The rollbackTs for many vbuckets is "0" 2022-07-26T05:06:09.430-07:00 [Info] StorageMgr::handleRollback rollbackTs is bucket: test8, vbuckets: 1024 Crc64: 0 snapType NO_SNAP - {vbno, vbuuid, seqno, snapshot-start, snapshot-end} { 0 1a84ffdb1dbd 0 0 0} { 1 4fdea47cd9ba 0 0 0} { 2 20aa63115fd1 126103 0 0} { 3 f123fa709ea7 127215 0 0} d. Taking vb: 0 as example, memcached oom kill happened and that lead to memcached create a new vbuuid due to unclean shutdown 2022-07-26T05:02:58.678-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Control connection to memcached on 'ns_1@172.23.105.36' disconnected. Check logs for details. 2022-07-26T05:03:00.548-07:00, ns_log:0:info:message(ns_1@172.23.105.36) - Service 'memcached' exited with status 137. Restarting. Messages: ...   2022-07-26T05:03:08.066510-07:00 INFO (test8) Warmup::createVBuckets: vb:0 created new failover entry with uuid:29158530555325 and seqno:116360 due to unclean shutdown e. Indexer received mutations with this vbuuid from memcached on 100.36 post restart, till the connection is closed 2022-07-26T05:03:16.199803-07:00 INFO 148: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-18233705252004069608/0 - (vb:0) Creating stream with start seqno 98199 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none ... 2022-07-26T05:05:05.432874-07:00 WARNING 148: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-18233705252004069608/0 - (vb:0) Stream closing, sent until seqno 109400 remaining items 0, reason: The stream closed early because the conn was disconnected f. The connnection is closed at the time of failover 2022-07-26T05:05:02.810-07:00, failover:0:info:message(ns_1@172.23.105.36) - Starting failing over ['ns_1@172.23.105.36'] 2022-07-26T05:05:02.810-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.36) - Starting failover of nodes ['ns_1@172.23.105.36']. Operation Id = 2139917b8f5387cec83fac633b7e7b5a g. Due to failover, the vbuuid changed 2022-07-26T05:05:04.916622-07:00 INFO (test8) VBucket::setState: transitioning vb:0 with high seqno:126312 from:replica to:active meta:{"topology":[["ns_1@172.23.105.37",null]]} 2022-07-26T05:05:04.916676-07:00 INFO (test8) KVBucket::setVBucketState: vb:0 created new failover entry with uuid:122286947444349 and seqno:125619 g. When KV node fail over happened, this has led indexer to restart the stream with 1a84ffdb1dbd as node UUID and 109400 as the last received seqno. 2022-07-26T05:05:56.934-07:00 [Info] FEED[<=>MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf(127.0.0.1:8091)] ##98 start-timestamp bucket: test8, vbuckets: 171 - {vbno, vbuuid, seqno, snapshot-start, snapshot-end} { 0 1a84ffdb1dbd 109400 54044 116360} { 1 4fdea47cd9ba 92432 53304 113709} g. When indexer made the request, failover table did not contain the 29158530555325 (or 1a84ffdb1dbd) UUID. Hence, memcached threw this error Failover table: test8   vb_0:0:id: 122286947444349 vb_0:0:seq: 125619 vb_0:1:id: 216042336623651 vb_0:1:seq: 0 2022-07-26T05:05:56.981326-07:00 WARNING 61: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf-1290788532655393424/1 - (vb:0) Stream request requires rollback to seqno:0 because vBucket UUID not found in failover table, consumer and producer have no common history. Client requested seqnos:{109400,18446744073709551615} snapshot:{54044,116360} uuid:29158530555325 This has resulted in rollback to zero. The UUID created by the failed over node (100.36) is not persisted in the failover table. When failover happened, memcached did not have the UUID indexer wanted and hence there was a rollback to zero. I checked some other vbuckets as well. The issue appears to be different from customer issue.
          drigby Dave Rigby added a comment -

          Varun Velamuri Is this similar to one of the other MBs raised this week where if Indexer had retired with the previous UUID (and adjusted snapshot range) it could have avoided rolling back all the way to zero?

          drigby Dave Rigby added a comment - Varun Velamuri Is this similar to one of the other MBs raised this week where if Indexer had retired with the previous UUID (and adjusted snapshot range) it could have avoided rolling back all the way to zero?

          Dave Rigby  - We could have avoided rollback by using previous VBUUID and snapshot range. The bigger problem I see is that indexer did not have a snapshot to retry with. This needs to be investigated further

          2022-07-26T05:06:09.431-07:00 [Info] StorageMgr::handleRollback latestSnapInfo <nil> lastRollbackTs <nil>. Use latest snapshot.
          

           

          varun.velamuri Varun Velamuri added a comment - Dave Rigby   - We could have avoided rollback by using previous VBUUID and snapshot range. The bigger problem I see is that indexer did not have a snapshot to retry with. This needs to be investigated further 2022-07-26T05:06:09.431-07:00 [Info] StorageMgr::handleRollback latestSnapInfo <nil> lastRollbackTs <nil>. Use latest snapshot.  
          varun.velamuri Varun Velamuri added a comment - - edited

          h. From the logs, the bucket "test8" has been flushed at 2022-07-26T03:26:20.792-07:00. }}So, all the indexed data (including disk snapshots) is lost at this time and the next disk snapshot can only be generated at around {{2022-07-26T03:36 i.e. 10 minutes after stream start. I do not see any disk snapshot getting generated after flush

          2022-07-26T03:26:20.792-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.36) - Flushing bucket "test8" from node 'ns_1@172.23.105.36'

          i. After the stream started, indexer started to receiving larger timestamps. From projector logs on node 105.36, the avg. DCP snapshot size has grown from 22 to  392 - essentially indexer is receiving larger snapshots from DCP. This could be due to slow KV (I am claiming slow KV as mutChLen at projector is zero - which means projector is waiting for data from KV)

          2022-07-26T03:27:28.147-07:00 [Info] KVDT[<-test8<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf] ##72 stats: {"hbCount":14,"eventCount":40059,"reqCount":342,"endCount":0,"snapStat.samples":3131,"snapStat.min":1,"snapStat.max":356,"snapStat.avg":22,"upsertCount":36586,"deleteCount":0,"exprCount":0,"ainstCount":1,"dinstCount":0,"tsCount":0,"mutChLen":0,"numDocsProcessed":36520,"numDocsPending":101094}
          ...
          2022-07-26T03:36:40.381-07:00 [Info] KVDT[<-test8<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf] ##72 stats: {"hbCount":287,"eventCount":3881623,"reqCount":342,"endCount":0,"snapStat.samples":12328,"snapStat.min":1,"snapStat.max":21854,"snapStat.avg":392,"upsertCount":3868953,"deleteCount":0,"exprCount":0,"ainstCount":1,"dinstCount":0,"tsCount":0,"mutChLen":0,"numDocsProcessed":3866756,"numDocsPending":7632366}
          

          j. Larger snapshot would mean that indexer will not be able to align the seqno. received to snapshot end. After a while, the size of snapshot has grown really large

          2022-07-26T04:19:12.914-07:00 [Info] MutationStreamReader::updateSnapInFilter Huge Snapshot Received for test8 48 MAINT_STREAM 3115-54850
          ...
          2022-07-26T05:03:24.863-07:00 [Info] MutationStreamReader::updateSnapInFilter Huge Snapshot Received for test8 280 MAINT_STREAM 54800-118903
          

          k. Indexer could not align the timestamps to snapshots & hence no disk snapshot got persisted. The num_snapshots/num_commits stats are not changing on indexer side & the num_nonalign_ts is grown drastically. This suggests that indexer did not get even a single timestamp in which all vbucket seqnos. are aligned to snapshot boundaries.

          2022-07-26T03:28:32.918-07:00 [Info] PeriodicStats = {"test8:idx8:num_snapshots":57067, "test8:idx8:num_commits":13,
          "test8:num_nonalign_ts":293053}
          ...
          2022-07-26T05:05:33.111-07:00 [Info] PeriodicStats = {"test8:idx8:num_snapshots":57067, "test8:idx8:num_commits":13, "test8:num_nonalign_ts":569809}
          

          l. As there was no disk snapshot, indexer could not rollback to a disk snapshot & hence it rolled back to zero. 

          This is another scenario where a slow KV could send potentially large snapshots which will lead to indexer not able to align the seqnos. to the snapshots & therefore had to rollback to zero (or a very old disk snapshot). Even if one vbucket is not snapshot aligned, indexer would not generate a disk snapshot

          CC: Deepkaran Salooja , Amit Kulkarni , Dave Finlay , Dave Rigby 

          varun.velamuri Varun Velamuri added a comment - - edited h. From the logs, the bucket "test8" has been flushed at 2022-07-26T03:26:20.792-07:00. }}So, all the indexed data (including disk snapshots) is lost at this time and the next disk snapshot can only be generated at around {{2022-07-26T03:36 i.e. 10 minutes after stream start. I do not see any disk snapshot getting generated after flush 2022-07-26T03:26:20.792-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.36) - Flushing bucket "test8" from node 'ns_1@172.23.105.36' i. After the stream started, indexer started to receiving larger timestamps. From projector logs on node 105.36, the avg. DCP snapshot size has grown from 22 to  392 - essentially indexer is receiving larger snapshots from DCP. This could be due to slow KV (I am claiming slow KV as mutChLen at projector is zero - which means projector is waiting for data from KV) 2022-07-26T03:27:28.147-07:00 [Info] KVDT[<-test8<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf] ##72 stats: {"hbCount":14,"eventCount":40059,"reqCount":342,"endCount":0,"snapStat.samples":3131,"snapStat.min":1,"snapStat.max":356,"snapStat.avg":22,"upsertCount":36586,"deleteCount":0,"exprCount":0,"ainstCount":1,"dinstCount":0,"tsCount":0,"mutChLen":0,"numDocsProcessed":36520,"numDocsPending":101094} ... 2022-07-26T03:36:40.381-07:00 [Info] KVDT[<-test8<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_3952cfedf2ceec6b184b836c0f396aaf] ##72 stats: {"hbCount":287,"eventCount":3881623,"reqCount":342,"endCount":0,"snapStat.samples":12328,"snapStat.min":1,"snapStat.max":21854,"snapStat.avg":392,"upsertCount":3868953,"deleteCount":0,"exprCount":0,"ainstCount":1,"dinstCount":0,"tsCount":0,"mutChLen":0,"numDocsProcessed":3866756,"numDocsPending":7632366} j. Larger snapshot would mean that indexer will not be able to align the seqno. received to snapshot end. After a while, the size of snapshot has grown really large 2022-07-26T04:19:12.914-07:00 [Info] MutationStreamReader::updateSnapInFilter Huge Snapshot Received for test8 48 MAINT_STREAM 3115-54850 ... 2022-07-26T05:03:24.863-07:00 [Info] MutationStreamReader::updateSnapInFilter Huge Snapshot Received for test8 280 MAINT_STREAM 54800-118903 k. Indexer could not align the timestamps to snapshots & hence no disk snapshot got persisted. The num_snapshots/num_commits stats are not changing on indexer side & the num_nonalign_ts is grown drastically. This suggests that indexer did not get even a single timestamp in which all vbucket seqnos. are aligned to snapshot boundaries. 2022-07-26T03:28:32.918-07:00 [Info] PeriodicStats = {"test8:idx8:num_snapshots":57067, "test8:idx8:num_commits":13, "test8:num_nonalign_ts":293053} ... 2022-07-26T05:05:33.111-07:00 [Info] PeriodicStats = {"test8:idx8:num_snapshots":57067, "test8:idx8:num_commits":13, "test8:num_nonalign_ts":569809} l. As there was no disk snapshot, indexer could not rollback to a disk snapshot & hence it rolled back to zero.  This is another scenario where a slow KV could send potentially large snapshots which will lead to indexer not able to align the seqnos. to the snapshots & therefore had to rollback to zero (or a very old disk snapshot). Even if one vbucket is not snapshot aligned, indexer would not generate a disk snapshot CC: Deepkaran Salooja , Amit Kulkarni , Dave Finlay , Dave Rigby  
          dfinlay Dave Finlay added a comment -

          I think the case where the bucket got flushed and Indexer had to rollback to zero is completely fine.

          The is when the bucket flush starts:

          [ns_server:debug,2022-07-26T03:26:48.768-07:00,ns_1@172.23.105.36:ns_audit<0.28196.0>:ns_audit:handle_call:146]Audit flush_bucket: [{remote,{[{ip,<<"10.100.195.195">>},{port,49358}]}},
                               {sessionid,<<"634890a73e552079485242dc299216a91c34cdda">>},
                               {real_userid,{[{domain,builtin},
                                              {user,<<"<ud>Administrator</ud>">>}]}},
                               {timestamp,<<"2022-07-26T03:26:48.768-07:00">>},
                               {bucket_name,<<"test8">>}]
          

          This is when it finishes:

          [ns_server:debug,2022-07-26T03:26:48.768-07:00,ns_1@172.23.105.36:ns_audit<0.28196.0>:ns_audit:handle_call:146]Audit flush_bucket: [{remote,{[{ip,<<"10.100.195.195">>},{port,49358}]}},
                               {sessionid,<<"634890a73e552079485242dc299216a91c34cdda">>},
                               {real_userid,{[{domain,builtin},
                                              {user,<<"<ud>Administrator</ud>">>}]}},
                               {timestamp,<<"2022-07-26T03:26:48.768-07:00">>},
                               {bucket_name,<<"test8">>}]
          

          Flushs empties the bucket by deleting the vbucket which results in the creation of a new vbuuid. It's not surprising that Indexing rolls back to zero a few seconds after the bucket flush finishes:

          Index node1 ( 172.23.106.159) 
          2022-07-26T03:26:54.738-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8
          

          The second case at 5:06 is more interesting as it is the result of a failover induced by an OOM kill.

          OOM kill:

          2022-07-26T05:02:58.678-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Control connection to memcached on 'ns_1@172.23.105.36' disconnected. Check logs for details.
          2022-07-26T05:03:00.548-07:00, ns_log:0:info:message(ns_1@172.23.105.36) - Service 'memcached' exited with status 137. Restarting. Messages:
          

          Failover:

          2022-07-26T05:05:02.810-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.36) - Starting failover of nodes ['ns_1@172.23.105.36']. Operation Id = 2139917b8f5387cec83fac633b7e7b5a
          2022-07-26T05:05:05.290-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Shutting down bucket "test8" on 'ns_1@172.23.105.36' for deletion
          2022-07-26T05:05:07.683-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Bucket "test6" loaded on node 'ns_1@172.23.105.36' in 124 seconds.
          2022-07-26T05:05:08.608-07:00, failover:0:info:message(ns_1@172.23.105.36) - Failed over ['ns_1@172.23.105.36']: ok
          2022-07-26T05:05:08.651-07:00, failover:0:info:message(ns_1@172.23.105.36) - Deactivating failed over nodes ['ns_1@172.23.105.36']
          2022-07-26T05:05:08.690-07:00, auto_failover:0:info:message(ns_1@172.23.105.36) - Node ('ns_1@172.23.105.36') was automatically failed over. Reason: The data service did not respond for the duration of the auto-failover threshold. Either none of the buckets have warmed up or there is an issue with the data service.
          

          Rollback to zero:

          Index node1 ( 172.23.106.159) 
          2022-07-26T05:06:12.658-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8
          

          CPU is high on all 3 of the memcached nodes for the last 10 minutes before the failover – however it doesn't seem to be that high prior to that.

          That said, I see quite a lot of slow scheduling for the DCP connection notifier task on the non-IO thread workers during this time.

          $ grep "T0[345].*test8.*Slow scheduling for NON_IO task .DCP connection" memcached.log | grep -o 2022-..-..T..:. | uniq -c
            50 2022-07-26T03:0
            13 2022-07-26T03:1
            14 2022-07-26T03:2
            87 2022-07-26T03:3
            19 2022-07-26T03:4
            65 2022-07-26T03:5
            79 2022-07-26T04:0
            41 2022-07-26T04:1
            13 2022-07-26T04:2
            22 2022-07-26T04:3
            45 2022-07-26T04:4
            27 2022-07-26T04:5
            33 2022-07-26T05:0
          

          Some of these waits are pretty lengthy. Dave - is this evidence for KV being slow to get mutations to indexing? Is there a reason for the slowness?

          dfinlay Dave Finlay added a comment - I think the case where the bucket got flushed and Indexer had to rollback to zero is completely fine. The is when the bucket flush starts: [ns_server:debug,2022-07-26T03:26:48.768-07:00,ns_1@172.23.105.36:ns_audit<0.28196.0>:ns_audit:handle_call:146]Audit flush_bucket: [{remote,{[{ip,<<"10.100.195.195">>},{port,49358}]}}, {sessionid,<<"634890a73e552079485242dc299216a91c34cdda">>}, {real_userid,{[{domain,builtin}, {user,<<"<ud>Administrator</ud>">>}]}}, {timestamp,<<"2022-07-26T03:26:48.768-07:00">>}, {bucket_name,<<"test8">>}] This is when it finishes: [ns_server:debug,2022-07-26T03:26:48.768-07:00,ns_1@172.23.105.36:ns_audit<0.28196.0>:ns_audit:handle_call:146]Audit flush_bucket: [{remote,{[{ip,<<"10.100.195.195">>},{port,49358}]}}, {sessionid,<<"634890a73e552079485242dc299216a91c34cdda">>}, {real_userid,{[{domain,builtin}, {user,<<"<ud>Administrator</ud>">>}]}}, {timestamp,<<"2022-07-26T03:26:48.768-07:00">>}, {bucket_name,<<"test8">>}] Flushs empties the bucket by deleting the vbucket which results in the creation of a new vbuuid. It's not surprising that Indexing rolls back to zero a few seconds after the bucket flush finishes: Index node1 ( 172.23.106.159) 2022-07-26T03:26:54.738-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8 The second case at 5:06 is more interesting as it is the result of a failover induced by an OOM kill. OOM kill: 2022-07-26T05:02:58.678-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Control connection to memcached on 'ns_1@172.23.105.36' disconnected. Check logs for details. 2022-07-26T05:03:00.548-07:00, ns_log:0:info:message(ns_1@172.23.105.36) - Service 'memcached' exited with status 137. Restarting. Messages: Failover: 2022-07-26T05:05:02.810-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.36) - Starting failover of nodes ['ns_1@172.23.105.36']. Operation Id = 2139917b8f5387cec83fac633b7e7b5a 2022-07-26T05:05:05.290-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Shutting down bucket "test8" on 'ns_1@172.23.105.36' for deletion 2022-07-26T05:05:07.683-07:00, ns_memcached:0:info:message(ns_1@172.23.105.36) - Bucket "test6" loaded on node 'ns_1@172.23.105.36' in 124 seconds. 2022-07-26T05:05:08.608-07:00, failover:0:info:message(ns_1@172.23.105.36) - Failed over ['ns_1@172.23.105.36']: ok 2022-07-26T05:05:08.651-07:00, failover:0:info:message(ns_1@172.23.105.36) - Deactivating failed over nodes ['ns_1@172.23.105.36'] 2022-07-26T05:05:08.690-07:00, auto_failover:0:info:message(ns_1@172.23.105.36) - Node ('ns_1@172.23.105.36') was automatically failed over. Reason: The data service did not respond for the duration of the auto-failover threshold. Either none of the buckets have warmed up or there is an issue with the data service. Rollback to zero: Index node1 ( 172.23.106.159) 2022-07-26T05:06:12.658-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test8 CPU is high on all 3 of the memcached nodes for the last 10 minutes before the failover – however it doesn't seem to be that high prior to that. That said, I see quite a lot of slow scheduling for the DCP connection notifier task on the non-IO thread workers during this time. $ grep "T0[345].*test8.*Slow scheduling for NON_IO task .DCP connection" memcached.log | grep -o 2022-..-..T..:. | uniq -c 50 2022-07-26T03:0 13 2022-07-26T03:1 14 2022-07-26T03:2 87 2022-07-26T03:3 19 2022-07-26T03:4 65 2022-07-26T03:5 79 2022-07-26T04:0 41 2022-07-26T04:1 13 2022-07-26T04:2 22 2022-07-26T04:3 45 2022-07-26T04:4 27 2022-07-26T04:5 33 2022-07-26T05:0 Some of these waits are pretty lengthy. Dave - is this evidence for KV being slow to get mutations to indexing? Is there a reason for the slowness?
          drigby Dave Rigby added a comment -

          From DaveF:

          Dave - is this evidence for KV being slow to get mutations to indexing? Is there a reason for the slowness?

          Looking at mortimer we for the flush around 03:26, we see the test8 has a bucket flush at [1], followed by the bucket being repopulated with data. Nodes 106.37 and 106.156 hit the high watermark and successfully free memory, dropping back to low watermark and continue to oscillate as more data is added, but node 105.36 exceeds the high_watermark and never comes back down below the high watermark:

          Looking at a breakdown of memory, we see large amounts in CheckpointManager, but not in the disk queue - i.e. the bulk of the memory is outstanding items for DCP stream(s):

          Looking at item counts, we see both replication and GSI have large number of items remaining:

          Essentially the workload is generating more mutations than KV-Engine can transmit via DCP to replicas and indexing nodes; hence a backlog is formed which consumes most of the bucket memory. To attempt to alleviate that KV will switch from in-memory to disk backfill - which will result in larger snapshots being sent:

          2022-07-26T03:32:46.494248-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) Handling slow stream; state_ : in-memory, lastReadSeqno : 5279, lastSentSeqno : 3452, vBucketHighSeqno : 24649, isBackfillTaskRunning : False
          2022-07-26T04:03:48.467911-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) ActiveStream::transitionState: Transitioning from in-memory to backfilling
          2022-07-26T04:03:48.472522-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0" backfill:true, seqno:5873
          2022-07-26T04:03:48.474445-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) Scheduling backfill from 5280 to 5872, reschedule flag : True
          

          Note that we have improved this behaviour in later versions - most notably in 7.1 by limiting the amount of memory which can be used by CheckpontManager (for holding outstanding DCP & disk mutations) - that wouldn't avoid the switch to backfilling, but would keep more memory free in the bucket for normal operations, andr apply backpressure to the client sooner telling them we are temporarily out of memory.

          The later OOM-kill around 05:02 is essentially the same - there rate of ingress mutations is greater than the environment can egress to replication and indexer, resulting in large disk backfill snapshots being sent to indexing.

          Note this setup only has:

          • Only one AuxIO thread, which in 6.6.2 is responsible for running tasks which is responsible for all processing of in-memory Checkpoints into DCP messages - for replicas and indexing, all buckets, and all disk backfills.
          • Two NonIO threads, which are running ItemCompressorTasks (active compression), ItemPaging, and notifying DCP connections data is ready.

          Essentially the system is not well-sized for the workload being run on it, and things are running slow / queues are backing up etc.

          drigby Dave Rigby added a comment - From DaveF: Dave - is this evidence for KV being slow to get mutations to indexing? Is there a reason for the slowness? Looking at mortimer we for the flush around 03:26, we see the test8 has a bucket flush at [1] , followed by the bucket being repopulated with data. Nodes 106.37 and 106.156 hit the high watermark and successfully free memory, dropping back to low watermark and continue to oscillate as more data is added, but node 105.36 exceeds the high_watermark and never comes back down below the high watermark: Looking at a breakdown of memory, we see large amounts in CheckpointManager, but not in the disk queue - i.e. the bulk of the memory is outstanding items for DCP stream(s): Looking at item counts, we see both replication and GSI have large number of items remaining: Essentially the workload is generating more mutations than KV-Engine can transmit via DCP to replicas and indexing nodes; hence a backlog is formed which consumes most of the bucket memory. To attempt to alleviate that KV will switch from in-memory to disk backfill - which will result in larger snapshots being sent: 2022-07-26T03:32:46.494248-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) Handling slow stream; state_ : in-memory, lastReadSeqno : 5279, lastSentSeqno : 3452, vBucketHighSeqno : 24649, isBackfillTaskRunning : False 2022-07-26T04:03:48.467911-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) ActiveStream::transitionState: Transitioning from in-memory to backfilling 2022-07-26T04:03:48.472522-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0" backfill:true, seqno:5873 2022-07-26T04:03:48.474445-07:00 INFO 64: (test8) DCP (Producer) eq_dcpq:secidx:proj-test8-MAINT_STREAM_TOPIC_d44295ce804e3e6e97cc718dd4f42cba-3631624410218667233/0 - (vb:264) Scheduling backfill from 5280 to 5872, reschedule flag : True Note that we have improved this behaviour in later versions - most notably in 7.1 by limiting the amount of memory which can be used by CheckpontManager (for holding outstanding DCP & disk mutations) - that wouldn't avoid the switch to backfilling, but would keep more memory free in the bucket for normal operations, andr apply backpressure to the client sooner telling them we are temporarily out of memory. The later OOM-kill around 05:02 is essentially the same - there rate of ingress mutations is greater than the environment can egress to replication and indexer, resulting in large disk backfill snapshots being sent to indexing. Note this setup only has: Only one AuxIO thread, which in 6.6.2 is responsible for running tasks which is responsible for all processing of in-memory Checkpoints into DCP messages - for replicas and indexing, all buckets, and all disk backfills. Two NonIO threads, which are running ItemCompressorTasks (active compression), ItemPaging, and notifying DCP connections data is ready. Essentially the system is not well-sized for the workload being run on it, and things are running slow / queues are backing up etc.

          Varun Velamuri Pavan PB Since, this is marked for 7.1.2, can you pls also update the due date on this? Thanks! 

          prachi.sikchi Prachi Sikchi (Inactive) added a comment - Varun Velamuri Pavan PB Since, this is marked for 7.1.2, can you pls also update the due date on this? Thanks! 

          I don't think there is anything wrong with indexer in this case. Removing the fix version as 7.1.2 for now.

          varun.velamuri Varun Velamuri added a comment - I don't think there is anything wrong with indexer in this case. Removing the fix version as 7.1.2 for now.

          Pavan PB - Resolving this as not-a-bug as it turned out to be sizing issue (See the comment from Dave Rigby above)

          varun.velamuri Varun Velamuri added a comment - Pavan PB - Resolving this as not-a-bug as it turned out to be sizing issue (See the comment from Dave Rigby above)

          People

            varun.velamuri Varun Velamuri
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty