Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48419

Incorrect num_items/mem_used is shown on the dashboard during KV rebalance IN at 15% DGM

    XMLWordPrintable

Details

    Description

      Sorry for leaving it to Engg to figure out the problem. Trying to elaborate it here:

      Steps:
      1. Create a 4 kv node and 2 index/n1ql node cluster
      2. Create magma bucket, 50 collections under default scope
      3. Load 125M items and upsert them
      4. Load another 125M items and upsert them as well
      5. Create 50 indexes on 50 collections and build them. Start 50 QPS
      6. Rebalance In 1 node with doc_ops=create:update:delete:read in parallel
      7. During the rebalance it is observed that the various stats were getting empty. Observe the disk used and the items count in the below image:

      Bucket Stats:

      Expected= 250M, Actual 0

      The cluster stats here show the mem_used = 800MB while it is supposed to me more as we have 250M items in the cluster and RAM available is ~85GB then how is it possible that mem_used to be at 800MB.

      Cluster Stats:

      Finally, the nodes are turning into amber randomly as shown below which is unexpected. The attached video demonstrate it better.
      Servers:

      QE Test

      git fetch "http://review.couchbase.org/TAF" refs/changes/59/161059/9 && git checkout FETCH_HEAD
      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job1.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.Hospital.Murphy.test_rebalance,nodes_init=4,graceful=True,skip_cleanup=True,num_items=2500000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,bucket_type=membase,eviction_policy=fullEviction,iterations=2,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,num_collections=50,maxttl=10,num_indexes=50,pc=25,index_nodes=2,cbas_nodes=0,fts_nodes=0,ops_rate=80000,ramQuota=17000,doc_ops=create:update:delete:read,rebl_ops_rate=10000,key_type=RandomKey -m rest'
      

      Nodes are going down randomly. Check out the attached video.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty