Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53186

[6.6.5 build 10104] - Multiple primary Indexes rollback to zero after KV node auto failover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 6.6.5, 7.1.2
    • 6.6.5
    • secondary-index
    • None
    • Enterprise Edition 6.6.5 build 10104
    • Untriaged
    • Centos 64-bit
    • 1
    • No

    Description

      Steps to Repro
      1. Create a 6 node cluster with 3kv, 2 indexing and 1 n1ql nodes.
      2. Create buckets/data/indexes and push buckets to dgm and ensure indexes are in DGM as well. Start running queries in background with request_plus consistency level.
      3. Ran the following script to validate MB-53057 which kills memcached(on 172.23.100.34), waits for AF to kick in, does full recovery and then rebalances in an infinite loop.

      #!/bin/bash
      while :
      do
          echo "killing memcached..."
          kill -9 `pidof memcached`
          echo "Waiting for auto failover to kick in..."
          sleep 180
          echo "Listing node status post Auto failover..."
          /opt/couchbase/bin/couchbase-cli server-list -c localhost:8091 --username Administrator --password password
          sleep 30
          echo "Starting full recovery..."    
          /opt/couchbase/bin/couchbase-cli recovery -c localhost:8091 --username Administrator --password password --server-recovery 172.23.100.34:8091 --recovery-type full
          sleep 30
          echo "Starting Rebalance after recovering a failed over node..."  
          /opt/couchbase/bin/couchbase-cli rebalance -c localhost:8091 --username Administrator --password password
          sleep 4000
          echo "Listing rebalance status..."
          /opt/couchbase/bin/couchbase-cli rebalance-status -c localhost:8091 --username Administrator --password password
          sleep 30
          echo "Listing node status post rebalance..."
          /opt/couchbase/bin/couchbase-cli server-list -c localhost:8091 --username Administrator --password password
          sleep 300
      done
      

      Exactly same test as the one in MB-53180. However in this case, It seems like we have rolled back 2 primary indexes.

      172.23.106.159 : index

      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-29T05:07:37.832-07:00 [Info] StorageMgr::handleRollback Rollback Index: 10943164515644793993 PartitionId: 0 SliceId: 0 To Zero 
      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-29T05:07:37.832-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test4
      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-29T05:07:37.940-07:00 [Info] StorageMgr::handleRollback Rollback Index: 10943164515644793993 PartitionId: 0 SliceId: 0 To Zero 
      

      172.23.106.163 : index

      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-29T05:09:02.679-07:00 [Info] StorageMgr::handleRollback Rollback Index: 3721238277937800766 PartitionId: 0 SliceId: 0 To Zero 
      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-29T05:09:02.679-07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM test2
      /opt/couchbase/var/lib/couchbase/logs/indexer.log:2022-07-29T05:09:02.784-07:00 [Info] StorageMgr::handleRollback Rollback Index: 3721238277937800766 PartitionId: 0 SliceId: 0 To Zero 
      

      cbcollect_info attached.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty