Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25981

Rebalance hung with 1.3 million docs

    XMLWordPrintable

Details

    • Bug
    • Resolution: User Error
    • Major
    • None
    • 4.6.3
    • couchbase-bucket, ns_server
    • None

    Description

      Steps followed:

      • Created a 3 server cluster with 4.6.3-4136. Each VM had 16GB RAM and 100GB HDD space - s61903cnt72.sc.couchbase.com, s61904cnt72.sc.couchbase.com and s61905cnt72.sc.couchbase.com.
      • Created a 200GB virtual hard disk and loaded a 96GB customer backup data into the disk.
      • Attached the virtual hard disk above to s61905cnt72.sc.couchbase.com.
      • Started cbrestore from s61905cnt72.sc.couchbase.com with s61903cnt72.sc.couchbase.com as the target.
      • As cbrestore progressed, s61905cnt72.sc.couchbase.com kept going offline and online and got failed over automatically.
      • After the cbrestore finished, the server UI prompted for a rebalance as s61905cnt72.sc.couchbase.com was now reachable.
      • I initiated a rebalance
      • The rebalance got stuck at s61903cnt72.sc.couchbase.com - 50%, s61904cnt72.sc.couchbase.com - 83% and s61905cnt72.sc.couchbase.com - 50% for more than an hour.
      • The bucket had a little more than 1.3 million docs.
      • Below is the stack trace from the primary node.

      Stack trace:

      2017-09-05T15:09:58.983-07:00, auto_failover:0:info:message(ns_1@s61903cnt72.sc.couchbase.com) - Reset auto-failover count
      2017-09-05T15:10:26.284-07:00, ns_orchestrator:4:info:message(ns_1@s61903cnt72.sc.couchbase.com) - Starting rebalance, KeepNodes = ['ns_1@s61903cnt72.sc.couchbase.com',
                                       'ns_1@s61904cnt72.sc.couchbase.com',
                                       'ns_1@s61905cnt72.sc.couchbase.com'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
       
      2017-09-05T15:10:27.089-07:00, ns_storage_conf:0:info:message(ns_1@s61905cnt72.sc.couchbase.com) - Deleting old data files of bucket "data-bucket"
      2017-09-05T15:10:28.178-07:00, ns_rebalancer:0:info:message(ns_1@s61903cnt72.sc.couchbase.com) - Started rebalancing bucket index-bucket
      2017-09-05T15:10:28.391-07:00, ns_memcached:0:info:message(ns_1@s61905cnt72.sc.couchbase.com) - Bucket "index-bucket" loaded on node 'ns_1@s61905cnt72.sc.couchbase.com' in 0 seconds.
      2017-09-05T15:10:29.531-07:00, ns_vbucket_mover:0:info:message(ns_1@s61903cnt72.sc.couchbase.com) - Bucket "index-bucket" rebalance does not seem to be swap rebalance
      2017-09-05T15:10:56.413-07:00, ns_rebalancer:0:info:message(ns_1@s61903cnt72.sc.couchbase.com) - Started rebalancing bucket data-bucket
      2017-09-05T15:10:56.452-07:00, ns_memcached:0:info:message(ns_1@s61905cnt72.sc.couchbase.com) - Bucket "data-bucket" loaded on node 'ns_1@s61905cnt72.sc.couchbase.com' in 0 seconds.
      2017-09-05T15:10:57.621-07:00, ns_vbucket_mover:0:info:message(ns_1@s61903cnt72.sc.couchbase.com) - Bucket "data-bucket" rebalance does not seem to be swap rebalance
      -------------------------------
       
       
      per_node_processes('ns_1@s61903cnt72.sc.couchbase.com') =
           {<0.1081.3>,
            [{registered_name,[]},
             {status,waiting},
             {initial_call,{proc_lib,init_p,5}},
             {backtrace,
                 [<<"Program counter: 0x00007fa6112f1028 (ns_single_vbucket_mover:spawn_and_wait/1 + 72)">>,
                  <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
                  <<"0x00007fa678b18a08 Return addr 0x00007fa6781ea9d8 (misc:try_with_maybe_ignorant_after/2 + 80)">>,
                  <<"y(0)     []">>,<<"y(1)     []">>,<<"y(2)     <0.1504.6>">>,
                  <<>>,
                  <<"0x00007fa678b18a28 Return addr 0x00007fa6112f0e78 (ns_single_vbucket_mover:mover/6 + 1008)">>,
                  <<"y(0)     []">>,<<"y(1)     []">>,<<"y(2)     []">>,
                  <<"y(3)     []">>,
                  <<"y(4)     #Fun<ns_single_vbucket_mover.3.75768851>">>,
                  <<"y(5)     Catch 0x00007fa6781ea9f8 (misc:try_with_maybe_ignorant_after/2 + 112)">>,
                  <<>>,
                  <<"0x00007fa678b18a60 Return addr 0x00007fa67ce90b40 (proc_lib:init_p_do_apply/3 + 56)">>,
                  <<"y(0)     true">>,<<"y(1)     []">>,
                  <<"y(2)     ['ns_1@s61905cnt72.sc.couchbase.com','ns_1@s61903cnt72.sc.couchbase.com']">>,
                  <<"y(3)     ['ns_1@s61903cnt72.sc.couchbase.com',undefined]">>,
                  <<"y(4)     790">>,<<"y(5)     <0.32104.2>">>,<<>>,
                  <<"0x00007fa678b18a98 Return addr 0x0000000000892548 (<terminate process normally>)">>,
                  <<"y(0)     Catch 0x00007fa67ce90b60 (proc_lib:init_p_do_apply/3 + 88)">>,
                  <<>>]},
             {error_handler,error_handler},
             {garbage_collection,
                 [{min_bin_vheap_size,46422},
                  {min_heap_size,233},
                  {fullsweep_after,512},
                  {minor_gcs,9}]},
             {heap_size,1598},
             {total_heap_size,3196},
             {links,[<0.32104.2>,<0.1504.6>]},
             {monitors,[]},
             {monitored_by,[<0.453.0>]},
             {memory,26520},
             {messages,[]},
             {message_queue_len,0},
             {reductions,4067},
             {trap_exit,true},
             {current_location,
                 {ns_single_vbucket_mover,spawn_and_wait,1,
                     [{file,"src/ns_single_vbucket_mover.erl"},{line,106}]}},
             {dictionary,
                 [{cleanup_list,[<0.1504.6>]},
                  {'$ancestors',
                      [<0.32104.2>,<0.9695.2>,<0.1013.0>,ns_orchestrator_sup,
                       mb_master_sup,mb_master,<0.443.0>,ns_server_sup,
                       ns_server_nodes_sup,<0.166.0>,ns_server_cluster_sup,
                       <0.89.0>]},
                  {'$initial_call',{ns_single_vbucket_mover,mover,6}}]}]}
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            raghu.sarangapani Raghu Sarangapani (Inactive)
            raghu.sarangapani Raghu Sarangapani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty