Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53084

Index rollback to zero on memcached OOM kill

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • 6.6.2
    • couchbase-bucket
    • None
    • Untriaged
    • 1
    • Unknown

    Description

      While trying to reproduce the CBSE ticket , we ran into the index rollback to zero issue

       

      Steps to reproduce:

      1. Create 4 buckets 
      2. Create indexes with replicas on each of the 4 buckets.
      3. Run pillowfight to continuously load data ((buckets have 1M, 1M , 1M and 3M items). The bucket RR needs to be under 10%. Load until then
      4. Run a shell script that runs the request_plus scans continuously.
      5. Simulate the memcached kill which fails over the KV node (on the orchestrator) 
      6. Observe that the scans are timing out and that the index has rolled back to 0

       

      For step 5, the following commands were run -> 

      sudo chmod 777 /proc/sysrq-trigger 
      sudo echo f > /proc/sysrq-trigger
       
      This kills the process that's the most memory-intensive. It was repeated until dmesg showed that memcached was killed. After memcached was killed, autofailover was triggered and index rollback occurred. This was done on the orchestrator node (172.23.105.36)

      Index rollback

      2022-07-22T05:18:38.879-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test1 

      Scans timing out

      2022-07-22T05:07:21.412-07:00 [Info] SCAN##17582 RESPONSE status:(error = Indexer rollback), requestId: c4b8406d-9534-4173-af34-e39bb45a4af3
      2022-07-22T05:17:00.692-07:00 [Info] SCAN##17652 RESPONSE status:(error = Indexer rollback), requestId: ca08cb49-821a-4309-8a02-c4fed2e47bca
      2022-07-22T05:18:39.065-07:00 [Info] SCAN##17657 RESPONSE status:(error = Indexer rollback), requestId: fcea066f-4378-461f-9b30-5f22b7b4dd10 

      Logs ->

      s3://cb-customers-secure/cbse122792/122792/2022-07-22/collectinfo-2022-07-22t124922-ns_1@172.23.105.36.zip
      s3://cb-customers-secure/cbse122792/122792/2022-07-22/collectinfo-2022-07-22t124922-ns_1@172.23.105.37.zip
      s3://cb-customers-secure/cbse122792/122792/2022-07-22/collectinfo-2022-07-22t124922-ns_1@172.23.106.156.zip
      s3://cb-customers-secure/cbse122792/122792/2022-07-22/collectinfo-2022-07-22t124922-ns_1@172.23.106.159.zip
      s3://cb-customers-secure/cbse122792/122792/2022-07-22/collectinfo-2022-07-22t124922-ns_1@172.23.106.163.zip
      s3://cb-customers-secure/cbse122792/122792/2022-07-22/collectinfo-2022-07-22t124922-ns_1@172.23.106.204.zip 

      Cluster is still live -> http://172.23.105.36:8091/

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              pavan.pb Pavan PB
              pavan.pb Pavan PB
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty