Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53202

Index rollbackAllToZero messages seen with build 6.6.5-10106

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Steps to reproduce:

      1. Create 10 buckets 
      2. Create indexes with replicas on each of the 10 buckets.
      3. Run pillowfight to continuously load data.
      4. Run a shell script that runs the request_plus scans continuously.
      5. Make sure that some amount of memcached memory is in swap space. Can be checked by running 

        cat /proc/$(pidof memcached)/status | grep VmSwap 

      1. Simulate OOM kill by running the following commands (Pillowfight has to be running on one of the buckets at the very least)

        echo 1000 > /proc/$(pidof memcached)/oom_score_adj
        sudo chmod 777 /proc/sysrq-trigger 
        sudo echo f > /proc/sysrq-trigger    

      1. After auto-failover, perform a full recovery and rebalance in the node.

      RollbackToZero messages

      On 172.23.100.22 node 
       
      2022-07-31T23:42:45.961-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10
      2022-08-01T00:01:11.705-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10
      2022-08-01T00:03:39.669-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test1
      2022-08-01T00:04:20.787-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test2
      2022-08-01T00:04:21.307-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test3
       
       
      On 172.23.100.19 node
       
      2022-07-31T23:42:47.156-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10
      2022-08-01T00:01:17.415-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10
      2022-08-01T00:03:58.469-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test1
      2022-08-01T00:04:11.319-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test2
      2022-08-01T00:10:39.838-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test3
       

      Rollbacks on test10 can be ignored as a flush of test10 was performed. i.e rollbacks with these timestamps

      172.23.100.19 : index
      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-31T23:42:47.156-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10
       
      172.23.100.22 : index
      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-31T23:42:45.961-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10 

      But test1,test2,test3, and test10 (with the following timestamps) need to be investigated.

      172.23.100.22
       
      2022-08-01T00:01:11.705-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10
       
      172.23.100.19
       
      2022-08-01T00:01:17.415-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test10 

       

      cbcollect logs - newer

               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.15.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.16.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.17.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.19.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.22.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.121.215.zip 

       cbcollect logs  - older

      cbcollect logs: 
       
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659336774/collectinfo-2022-08-01T065255-ns_1%40172.23.100.15.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659336774/collectinfo-2022-08-01T065255-ns_1%40172.23.100.16.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659336774/collectinfo-2022-08-01T065255-ns_1%40172.23.100.17.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659336774/collectinfo-2022-08-01T065255-ns_1%40172.23.100.19.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659336774/collectinfo-2022-08-01T065255-ns_1%40172.23.100.22.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659336774/collectinfo-2022-08-01T065255-ns_1%40172.23.121.215.zip 

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            varun.velamuri Varun Velamuri added a comment - - edited

            Pavan PB  - Are these the right set of logs. Indexer.log (on both nodes) seems to terminate at around 2022-07-31T23:55 and the rollbacks of interest seems to have happened later

            varun.velamuri Varun Velamuri added a comment - - edited Pavan PB   - Are these the right set of logs. Indexer.log (on both nodes) seems to terminate at around 2022-07-31T23:55 and the rollbacks of interest seems to have happened later
            pavan.pb Pavan PB added a comment -

            I've updated the description with the correct log bundles. Here they are -

             

                     url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.15.zip
                     url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.16.zip
                     url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.17.zip
                     url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.19.zip
                     url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.22.zip
                     url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.121.215.zip 

            pavan.pb Pavan PB added a comment - I've updated the description with the correct log bundles. Here they are -            url : https: //cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.15.zip          url : https: //cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.16.zip          url : https: //cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.17.zip          url : https: //cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.19.zip          url : https: //cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.100.22.zip          url : https: //cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659340931/collectinfo-2022-08-01T080212-ns_1%40172.23.121.215.zip

            a. Indexer was asked to rollback and indexer started using disk snapshots

            2022-08-01T00:01:16.780-07:00 [Info] StorageMgr::handleRollback 13075063315852097481 latestSnapInfo SnapshotInfo: count:2059600 committed:false lastRollbackTs <nil>. Use latest snapshot.

            b. Indexer started bucket stream with timestamp from disk snapshot. For some vbuckets, the snapshot is not timestamp aligned (due to MB-53183)

             

            2022-08-01T00:01:21.165-07:00 [Info] Indexer::startBucketStream Stream: MAINT_STREAM Bucket: test3 SessionId 3 RestartTS bucket: test3, vbuckets: 1024 Crc64: 3345741357950139956 snapType FORCE_COMMIT -
                {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
            ...
                {  826     76dff20bd266      86829      86801      86836}
            

             

            c. Indexer was asked to rollback - Taking vb:125 as example

            2022-08-01T00:02:32.773-07:00 [Info] TK StreamBegin MAINT_STREAM test3 125 269460286127742 80873 3
            2022-08-01T00:02:32.773-07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket test3 vbucket 125. Rollback (80873,     f5129ca2be7e).

            d. Indexer did not try with older snapshot due to MB-53221

             

            2022-08-01T00:04:21.087-07:00 [Info] StorageMgr::handleRollback 13075063315852097481 Discarding Snapshot SnapshotInfo: count:2059600 committed:false. Need older than last used snapshot bucket: test3, vbuckets: 1024 Crc64: 3345741357950139956 snapType FORCE_COMMIT -
            ...
            2022-08-01T00:04:21.088-07:00 [Info] StorageMgr::handleRollback 13075063315852097481 Discarding Snapshot SnapshotInfo: count:2000000 committed:false. Need older than last used snapshot bucket: test3, vbuckets: 1024 Crc64: 3345741357950139956 snapType FORCE_COMMIT -
            

            Since indexer's both snapshots got exhausted, indexer had to rollback to zero. This issue contains both MB-53183 & MB-53221. Closing this as duplicate.

             

            varun.velamuri Varun Velamuri added a comment - a. Indexer was asked to rollback and indexer started using disk snapshots 2022-08-01T00:01:16.780-07:00 [Info] StorageMgr::handleRollback 13075063315852097481 latestSnapInfo SnapshotInfo: count:2059600 committed:false lastRollbackTs <nil>. Use latest snapshot. b. Indexer started bucket stream with timestamp from disk snapshot. For some vbuckets, the snapshot is not timestamp aligned (due to MB-53183 )   2022-08-01T00:01:21.165-07:00 [Info] Indexer::startBucketStream Stream: MAINT_STREAM Bucket: test3 SessionId 3 RestartTS bucket: test3, vbuckets: 1024 Crc64: 3345741357950139956 snapType FORCE_COMMIT -     {vbno, vbuuid, seqno, snapshot-start, snapshot-end} ...     {  826     76dff20bd266      86829      86801      86836}   c. Indexer was asked to rollback - Taking vb:125 as example 2022-08-01T00:02:32.773-07:00 [Info] TK StreamBegin MAINT_STREAM test3 125 269460286127742 80873 3 2022-08-01T00:02:32.773-07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket test3 vbucket 125. Rollback (80873,     f5129ca2be7e). d. Indexer did not try with older snapshot due to MB-53221   2022-08-01T00:04:21.087-07:00 [Info] StorageMgr::handleRollback 13075063315852097481 Discarding Snapshot SnapshotInfo: count:2059600 committed:false. Need older than last used snapshot bucket: test3, vbuckets: 1024 Crc64: 3345741357950139956 snapType FORCE_COMMIT - ... 2022-08-01T00:04:21.088-07:00 [Info] StorageMgr::handleRollback 13075063315852097481 Discarding Snapshot SnapshotInfo: count:2000000 committed:false. Need older than last used snapshot bucket: test3, vbuckets: 1024 Crc64: 3345741357950139956 snapType FORCE_COMMIT - Since indexer's both snapshots got exhausted, indexer had to rollback to zero. This issue contains both MB-53183 & MB-53221 . Closing this as duplicate.  

            Duplicate of MB-53183 & MB-53221

            varun.velamuri Varun Velamuri added a comment - Duplicate of MB-53183 & MB-53221

            Closing the ticket as Duplicate

            chanabasappa.ghali Chanabasappa Ghali added a comment - Closing the ticket as Duplicate

            People

              varun.velamuri Varun Velamuri
              pavan.pb Pavan PB
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty