Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58275

Indexer swap rebalance is failing with reason: Duplicate Index Instance

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.6.0
    • 7.2.1
    • secondary-index
    • couchbase-cloud-server-7.2.1-5904-v1.0.19

    Description

      1. Create a 7 nodes(c5.2xlarge) cluster: 3-KV, 2-GSI, 2-N1QL colocated.
      2. Create bucket, 10 collections, 100M items in each.
      3. Create GSI indexes on 2 collections.
      4. Start KV Read+expiry load at 10k ops/s(9k Reads, 1k Expiry). Start the n1ql query load in parallel.
      5. Scale up cluster to 4-KV, 3-GSI & 3-N1QL. Finished in 9063.69799995 seconds
      6. Scale up cluster to 5-KV, 4-GSI & 4-N1QL. Finished in 7320.00900006 seconds
      7. Scale down cluster to 4-KV, 3-GSI & 3-N1QL. Took 9600 seconds.
      8. Scale down cluster to 3-KV, 2-GSI & 2-N1QL. Took 14400 seconds.
      9. Do a EBS volume up scaling. As it is done online it finishes in few seconds
      10. Do a EBS volume down scaling. This triggers a swap rebalance of all the nodes and when it comes to indexer, rebalance is failing in a loop:

        Rebalance exited with reason {service_rebalance_failed,index,
        {worker_died,
        {'EXIT',<0.551.105>,
        {rebalance_failed,
        {service_error,
        <<"Duplicate Index Instance">>}}}}}.
        Rebalance Operation Id = 8250dd0807df3eecab283616d414a845
        

      QE Test

      sudo guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/couchbase_capella_volume_3_new.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.hostedHospital.Murphy.test_rebalance,num_items=100000000,num_buckets=1,bucket_names=GleamBook,bucket_type=membase,iterations=2,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,maxttl=10,pc=20,gsi_nodes=2,cbas_nodes=2,fts_nodes=2,kv_nodes=3,n1ql_nodes=2,kv_disk=1000,n1ql_disk=50,gsi_disk=500,fts_disk=1000,cbas_disk=1000,kv_compute=c5.2xlarge,gsi_compute=c5.2xlarge,n1ql_compute=c5.2xlarge,fts_compute=c5.2xlarge,cbas_compute=c5.2xlarge,mutation_perc=20,key_type=CircularKey,capella_run=true,services=data-index-query,rebl_services=data-index-query,max_rebl_nodes=27,provider=AWS,region=us-east-1,type=GP3,size=1000,collections=10,ops_rate=100000,skip_teardown_cleanup=true,wait_timeout=14400,index_timeout=28800,runtype=dedicated,skip_init=true,rebl_ops_rate=10000,collections=10,expiry=true,vh_scaling=true,horizontal_scale=1,clients_per_db=10 -m rest'
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-58275
          # Subject Branch Project Status CR V

          Activity

            People

              yash.dodderi Yash Dodderi
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty