Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60716

GSI rebalance stuck on changing the shardAffinity to False post 1 min swap rebalancing.

    XMLWordPrintable

Details

    Description

      1. Create a 3KV, 2 GSI, 2 N1QL node capella cluster of compute=c5.2xlarge
      2. Create a magma bucket, 2 collections, 100M items in each collection
      3. Create 2 indexes per collection.

        Index:
        CREATE INDEX default0_idx_VolumeCollection0_0 ON VolumeCollection0(country, DISTINCT ARRAY `r`.`ratings`.`Check in / front desk` FOR r in `reviews` END,array_count((`public_likes`)),array_count((`reviews`)) DESC,`type`,`phone`,`price`,`email`,`address`,`name`,`url`) PARTITION BY HASH (country) USING GSI WITH { "defer_build": true};
        Query:
        select meta().id from VolumeCollection0 where country is not null and `type` is not null and (any r in reviews satisfies r.ratings.`Check in / front desk` is not null end) limit 100
         
        Index:
        CREATE INDEX default0_idx_VolumeCollection0_1 ON VolumeCollection0(`free_breakfast`,`type`,`free_parking`,array_count((`public_likes`)),`price`,`country`) PARTITION BY HASH (type) USING GSI WITH { "defer_build": true};
        Query:
        select price, country from VolumeCollection0 where free_breakfast=True AND free_parking=True and price is not null and array_count(public_likes)>=0 and `type`= $type limit 100
         
        Index:
        CREATE INDEX default0_idx_VolumeCollection1_0 ON VolumeCollection1(`free_breakfast`,`free_parking`,`country`,`city`)  PARTITION BY HASH (country) USING GSI WITH { "defer_build": true};
        Query:
        select city,country from VolumeCollection1 where free_breakfast=True and free_parking=True order by country,city limit 100
         
        Index:
        CREATE INDEX default0_idx_VolumeCollection1_1 ON VolumeCollection1(`country`, `city`,`price`,`name`)  PARTITION BY HASH (country, city) USING GSI WITH { "defer_build": true};
        Query:
        WITH city_avg AS (SELECT city, AVG(price) AS avgprice FROM VolumeCollection1 WHERE country = $country GROUP BY city limit 10) SELECT h.name, h.price FROM city_avg JOIN VolumeCollection1 h ON h.city = city_avg.city WHERE h.price < city_avg.avgprice AND h.country=$country limit 100
        

      4. Change disk size on capella leads to swap rebalancing each GSI node

        Starting rebalance, KeepNodes = ['ns_1@svc-d-node-001.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com',
        'ns_1@svc-d-node-002.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com',
        'ns_1@svc-d-node-003.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com',
        'ns_1@svc-i-node-030.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com',
        'ns_1@svc-i-node-032.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com',
        'ns_1@svc-q-node-004.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com',
        'ns_1@svc-q-node-005.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com'], EjectNodes = ['ns_1@svc-i-node-031.8jt8h1nr6ekeixaf.sandbox.nonprod-project-avengers.com'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 4f19597ccf6e1ca1ac0fc06df2f1f94f
        

      5. When GSI swap rebalance start change the shardAffinity to False after 1 min.
      6. The rebalance is hung since 3 hours.
      7. The exact same test passed on the same build without changing the shardAffinity to False.
        cc: Varun Velamuri, Deepkaran Salooja

      QE Test

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/couchbase_capella_volume_2_new.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.hostedHospital.Murphy.test_rebalance,num_items=100000000,num_buckets=1,bucket_names=GleamBook,bucket_type=membase,iterations=4,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,maxttl=10,pc=10,gsi_nodes=2,cbas_nodes=3,fts_nodes=3,kv_nodes=3,n1ql_nodes=2,kv_disk=1000,n1ql_disk=50,gsi_disk=500,fts_disk=1000,cbas_disk=1000,kv_compute=c5.2xlarge,gsi_compute=c5.4xlarge,n1ql_compute=c5.2xlarge,fts_compute=c5.2xlarge,cbas_compute=c5.2xlarge,mutation_perc=20,key_type=CircularKey,capella_run=true,services=data-query-index,rebl_services=index,max_rebl_nodes=27,provider=AWS,region=us-east-1,type=GP3,size=1000,ops_rate=100000,skip_teardown_cleanup=true,wait_timeout=14400,index_timeout=28800,runtype=dedicated,skip_init=true,rebl_ops_rate=10000,collections=2,expiry=true,h_scaling=false,v_scaling=true,horizontal_scale=1'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ritesh.agarwal Ritesh Agarwal
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty