Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53162

RollbackAll to zero seen in the indexer logs



    • Untriaged
    • 1
    • Unknown


      Steps to reproduce:

      1. Create 4 buckets 
      2. Create indexes with replicas on each of the 4 buckets.
      3. Run pillowfight to continuously load data ((buckets have 1M, 1M , 1M and 3M items). The bucket RR needs to be under 10%. Load until then
      4. Run a shell script that runs the request_plus scans continuously.
      5. Run stress-ng with the params:

        stress-ng --vm 4 --vm-bytes 1G --metrics-brief --vm-keep --vm-locked -m 4 --aggressive --vm-populate

        (Adjust the --vm-bytes param depending upon the VM resources)

      6. Once you run enough stress-ng processes, OOM kill will kick in. This can be verified by checking the dmesg ( dmesg -T | egrep -i 'killed process' )
      7. There's a possibility that stress-ng gets spawned and killed since OOM kill is determined by a oom_score_adj factor. In order to make sure that memcached gets killed run this 

      echo 1000 > /proc/<memcached PID>/oom_score_adj  

      Rollbacks to zero seen : index
      2022-07-28T06:01:49.498-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test4
      2022-07-28T06:01:52.344-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test3 

      cbcollect logs -> 


          url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.15.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.16.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.17.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.19.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.100.22.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659015082/collectinfo-2022-07-28T133123-ns_1%40172.23.121.215.zip 

      The cluster wasn't in a healthy state as one of the nodes had very high memory consumption and a rebalance did not work. Please look around the timestamp 2022-07-28T06:01:49.498-07:00 (this is the only occurrence so shouldn't be confusing).


      Not clear if it helps but some of the older logs are here ->


           url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.15.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.16.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.17.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.19.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.100.22.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1659012574/collectinfo-2022-07-28T124935-ns_1%40172.23.121.215.zip 


        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.



              pavan.pb Pavan PB
              pavan.pb Pavan PB
              0 Vote for this issue
              7 Start watching this issue



                Gerrit Reviews

                  There are no open Gerrit changes