Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58824

Indexer rebalance failing in a loop due to cleanup pending from previous failed/aborted rebalance/failover/move index

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.6.0
    • 7.6.0
    • secondary-index
    • 7.6.0-1533

    Description

      1. Create a 3 KV node, 2 GSI, 2 N1QL node cluster.
      2. Create bucket, load data, start queries asynchronously. Start KV workload(Reads: 80%, 20% Expiry)
      3. Scale up the cluster by 1KV, 1 GSI, 1 N1QL node.
      4. Rebalance failed but on CP retry the node simply gets added without any movement which is expected. Rebalance passed.

        GSI nodes in Cluster

        Index Statistics
        +----------------------------------------------------------------------+-----------+----------+--------+--------+------------+------------+-----------+---------------+----------------+
        | Node                                                                 | mem_quota | mem_used | avg_rr | avg_dr | #data_size | #disk_size | #requests | #rows_scanned | #rows_returned |
        +----------------------------------------------------------------------+-----------+----------+--------+--------+------------+------------+-----------+---------------+----------------+
        | svc-qi-node-004.kto-ktkkyszxcv6.sandbox.nonprod-project-avengers.com | 27        | 22       | 13     | 773    | 128        | 37         | 701692    | 20965396253   | 327698108      |
        | svc-qi-node-005.kto-ktkkyszxcv6.sandbox.nonprod-project-avengers.com | 27        | 22       | 13     | 779    | 127        | 37         | 702944    | 20965790165   | 326020691      |
        | svc-qi-node-007.kto-ktkkyszxcv6.sandbox.nonprod-project-avengers.com | 27        | 0        | 0      | 0      | 0          | 0          | 0         | 0             | 0              |
        +----------------------------------------------------------------------+-----------+----------+--------+--------+------------+------------+-----------+---------------+----------------+
        

      5. Scale up the cluster again by 1KV, 1 GSI, 1 N1QL node.
      6. Rebalance failed

        Rebalance exited with reason {service_rebalance_failed,index,
        {worker_died,
        {'EXIT',<0.18938.608>,
        {{badmatch,
        {error,
        {bad_nodes,index,prepare_rebalance,
        [{'ns_1@svc-qi-node-005.kto-ktkkyszxcv6.sandbox.nonprod-project-avengers.com',
        {error,
        {unknown_error,
        <<"indexer rebalance failure - cleanup pending from previous failed/aborted rebalance/failover/move index. please retry the request later.">>}}}]}}},
        [{service_manager,rebalance_op,5,
        [{file,"src/service_manager.erl"},
        {line,338}]},
        {service_manager,do_run_op,1,
        [{file,"src/service_manager.erl"},
        {line,257}]},
        {proc_lib,init_p,3,
        [{file,"proc_lib.erl"},{line,225}]}]}}}}.
        Rebalance Operation Id = 561a36a57b1c456f4bded1dc6bacb7f8
        

      QE Test

      sudo guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/couchbase_capella_volume_3_new.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.hostedHospital.Murphy.test_rebalance,num_items=100000000,num_buckets=1,bucket_names=GleamBook,bucket_type=membase,iterations=4,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,maxttl=10,pc=20,gsi_nodes=2,cbas_nodes=2,fts_nodes=2,kv_nodes=3,n1ql_nodes=2,kv_disk=1000,n1ql_disk=50,gsi_disk=500,fts_disk=1000,cbas_disk=1000,kv_compute=n2-standard-16,gsi_compute=n2-standard-16,n1ql_compute=n2-standard-16,fts_compute=n2-standard-16,cbas_compute=n2-standard-16,mutation_perc=20,key_type=CircularKey,capella_run=true,services=data-index:query,rebl_services=data-index:query,max_rebl_nodes=27,provider=GCP,region=us-west1,type=PD-SSD,size=1000,ops_rate=100000,skip_teardown_cleanup=true,wait_timeout=14400,index_timeout=28800,runtype=dedicated,skip_init=false,rebl_ops_rate=10000,collections=10,expiry=true,vh_scaling=true,horizontal_scale=1,clients_per_db=10 -m rest'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ritesh.agarwal Ritesh Agarwal
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty