Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5176

Rebalancing out a Node from a cluster causes a very high ejection to the disk.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Major
    • 2.0
    • None
    • couchbase-bucket
    • Security Level: Public
    • None
    • 4 node cluster on Ubuntu
    • 2

    Description

      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Setup
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Bucket size : 3GB, Nodes : 4, keys inserted [64 - 256 bytes]

      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Repro Steps
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      1. Load high volume of data on the cluster[4 ndoes] , note - Data inserted > Low water Mark threshold.
      2. Remove a server
      3. Rebalance

      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Error
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      cbstats returns un-expectedly low resident memory ratio and high eject node ratio.

      sample :

      /opt/couchbase/bin/cbstats 10.1.3.67:11210 raw memory
      ep_kv_size: 1616332569
      ep_max_data_size: 3250585600
      ep_mem_high_wat: 2437939200
      ep_mem_low_wat: 1950351360
      ep_oom_errors: 0
      ep_overhead: 86679858
      ep_tmp_oom_errors: 0
      ep_value_size: 267558416
      mem_used: 2421153021
      tcmalloc_current_thread_cache_bytes: 7941064
      tcmalloc_max_thread_cache_bytes: 33554432
      tcmalloc_unmapped_bytes: 163840
      total_allocated_bytes: 1662925672
      total_fragmentation_bytes: 762578072
      total_free_bytes: 153034752
      total_heap_bytes: 2578538496

      The diags from all the nodes are attached below and the cbstats monitored during the rebalance are attached as high_load.log


      May -6 : Re-ran this test with the following setup, seeing drop in resident ratio( <50%) on one node and it has a higher fragmentation on it.
      Ran w/ 1024 buckets, the resident ratio after rebalancing out one node, stays fairly ok( ~80-90%) on 2 nodes but drops sharply( <50% ) on the master node(10.1.3.92).

      Build. - 1.8.1-802rel

      Setup - 3GB bucketsize, 4 Nodes( 10.1.3.73, 10.1.3.70, 10.1.3.71, 10.1.3.92), Num of replicas = 2
      vbuckets =1024, key-size=512-1k
      mcosda: pytests/performance/mcsoda.py membase://10.1.3.92:8091 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 min-value-size=512,1024 max-items=4000000 exit-after-creates=1 prefix=test_1_

      • Load stopped before rebalance Out
      • Total Items inserted = 2.28M
      • Remove node and Rebalance Out a node.
      • Smaller drop in resident ratio.

      Diags attached as 10.1.3.70-8091-diag.txt.gz 10.1.3.72-8091-diag.txt.gz 10.1.3.92-8091-diag.txt.gz

      Attachments

        1. 10.1.3.67-8091-diag.txt.gz
          6.73 MB
        2. 10.1.3.69-8091-diag.txt.gz
          7.24 MB
        3. 10.1.3.70-8091-diag.txt.gz
          13.24 MB
        4. 10.1.3.72-8091-diag.txt.gz
          5.80 MB
        5. 10.1.3.72-8091-diag.txt.gz
          2.91 MB
        6. 10.1.3.73-8091-diag.txt.gz
          2.74 MB
        7. 10.1.3.92-8091-diag.txt.gz
          16.61 MB
        8. data.tar
          8.54 MB
        9. monitor.log
          1.93 MB
        10. residentMem_02.tar
          19.44 MB
        11. residentMem_03
          7.10 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mikew Mike Wiederhold [X] (Inactive)
            ketaki Ketaki Gangal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty