Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50469

[NexusKVStore] Investigate why master node stays in 'pending'/amber state during rebalance operation

    XMLWordPrintable

Details

    Description

      Steps to repro:

      1. Create a 2 node cluster(172.23.122.245, 172.23.122.246(bucket_ram_quota = 2056 MB/node), replicas=0)
      2. Create 100 non default collections in default scope(collection count is 101(including default collection)
      3. Start Loading 500k docs in each of 100 non default collections
      4. While doc loading is going on drop few collections recreate few collections
      5. Remove node 172.23.122.246 and trigger rebalance
      6. Rebalance was successful
      7. Change replicas to 1 and again add back node 172.23.122.246
      8. Trigger Full compaction
      9. While rebalance is going on observed node 172.23.122.245 turned in to amber on UI (UI was showing 1 node in pending state)

      Note:
      Though I couldn't find anything weird in memcached logs
      Ben Huddleston also had a look in to the cluster, but he also didn't notice anything in memcached logs. So we wanted ns serv or UI team to have a look in to the logs.

      Also this issue is not consistent, I ran this test few times, but have encountered this issue only twice.

      QE-Test:

      git fetch "https://review.couchbase.org/TAF" refs/changes/88/166488/1 && git checkout FETCH_HEAD
      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/qe_r.ini -p bucket_storage=couchstore,rerun=false,bucket_eviction_policy=fullEviction,init_loading=False -t storage.magma.magma_rebalance.MagmaRebalance.test_data_load_collections_with_rebalance_in,num_items=500000,doc_size=256,nodes_init=2,nodes_in=1,standard_buckets=1,magma_buckets=0,bucket_storage=couchstore,data_load_stage=before,sdk_timeout=60,vbuckets=1024,key_size=12,replicas=0,infra_log_level=debug,log_level=debug,skip_cleaup=True,randomize_value=True,bucket_eviction_policy=fullEviction,infra_log_level=debug,log_level=debug,init_loading=False,fragmentation=30,skip_cleanup=True,autoCompactionDefined=true,iterations=1,enable_dp=True,num_collections=100,num_scopes=1,bucket_ram_quota=2056,skip_cleanup=True,sdk_client_pool=False,ops_rate=12000,doc_ops=create,create_perc=100,delete_perc=0,update_perc=0,num_collections_to_drop=0,get-cbcollect-info=True -m rest'
      

      CC: Ben Huddleston Daniel Owen

      Attachments

        1. amber_1.png
          amber_1.png
          362 kB
        2. amber_2.png
          amber_2.png
          388 kB
        3. amber_3.png
          amber_3.png
          226 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ankush.sharma Ankush Sharma
            ankush.sharma Ankush Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty