Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25434

rebalance stuck in centos longevity for 5 hours

    XMLWordPrintable

Details

    • Untriaged
    • No

    Description

      Rebalance got stuck in centos longevity (cb bucket + plasma mode) against 5.0.0-3388 - we introduced xattr and capi into our integration and started this build - but this seems to be not related to them as the very first rebalance post bucket creation and data loading got stuck - from diag.log on .103 node:

      2017-07-26T08:33:59.610-07:00, ns_orchestrator:4:info:message(ns_1@172.23.108.103) - Starting rebalance, KeepNodes = ['ns_1@172.23.106.188','ns_1@172.23.107.47',
                                       'ns_1@172.23.108.103','ns_1@172.23.108.107',
                                       'ns_1@172.23.97.237','ns_1@172.23.97.238',
                                       'ns_1@172.23.97.239','ns_1@172.23.97.242',
                                       'ns_1@172.23.98.135','ns_1@172.23.99.20',
                                       'ns_1@172.23.99.21','ns_1@172.23.99.22',
                                       'ns_1@172.23.99.25'], EjectNodes = ['ns_1@172.23.108.104'], Failed over and being ejected nodes = []; no delta recovery nodes
       
      2017-07-26T08:34:00.182-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket WAREHOUSE
      2017-07-26T08:34:00.480-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "WAREHOUSE" rebalance does not seem to be swap rebalance
      2017-07-26T08:34:37.309-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket STOCK
      2017-07-26T08:34:37.535-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "WAREHOUSE" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:34:37.620-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "STOCK" rebalance does not seem to be swap rebalance
      2017-07-26T08:35:14.746-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket ORDER_LINE
      2017-07-26T08:35:14.924-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "STOCK" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:35:15.052-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "ORDER_LINE" rebalance does not seem to be swap rebalance
      2017-07-26T08:35:54.225-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket ORDERS
      2017-07-26T08:35:54.422-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "ORDER_LINE" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:35:54.557-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "ORDERS" rebalance does not seem to be swap rebalance
      2017-07-26T08:36:31.646-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket NEW_ORDER
      2017-07-26T08:36:31.856-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "ORDERS" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:36:31.960-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "NEW_ORDER" rebalance does not seem to be swap rebalance
      2017-07-26T08:37:09.015-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket ITEM
      2017-07-26T08:37:09.209-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "NEW_ORDER" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:37:09.331-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "ITEM" rebalance does not seem to be swap rebalance
      2017-07-26T08:37:45.982-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket HISTORY
      2017-07-26T08:37:46.163-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "ITEM" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:37:46.280-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "HISTORY" rebalance does not seem to be swap rebalance
      2017-07-26T08:38:23.892-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket DISTRICT
      2017-07-26T08:38:24.100-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "HISTORY" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:38:24.193-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "DISTRICT" rebalance does not seem to be swap rebalance
      2017-07-26T08:39:00.623-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket CUSTOMER
      2017-07-26T08:39:00.829-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "DISTRICT" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:39:00.949-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "CUSTOMER" rebalance does not seem to be swap rebalance
      2017-07-26T08:39:37.847-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket default
      2017-07-26T08:39:38.019-07:00, ns_memcached:0:info:message(ns_1@172.23.108.104) - Shutting down bucket "CUSTOMER" on 'ns_1@172.23.108.104' for deletion
      2017-07-26T08:39:38.145-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "default" rebalance does not seem to be swap rebalance
      2017-07-26T10:03:51.684-07:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.108.103) - Client-side error-report for user undefined on node 'ns_1@172.23.108.103':
      User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
      Got unhandled javascript error:
      message: The transition errored;
       
       
      2017-07-26T11:30:02.774-07:00, mb_master:0:info:message(ns_1@172.23.98.135) - Haven't heard from a higher priority node or a master, so I'm taking over.
      2017-07-26T11:39:50.774-07:00, mb_master:0:info:message(ns_1@172.23.98.135) - Haven't heard from a higher priority node or a master, so I'm taking over.
      -------------------------------
       
       
      per_node_processes('ns_1@172.23.108.103') =
           {<0.21656.11>,
            [{registered_name,[]},
             {status,waiting},
             {initial_call,{proc_lib,init_p,3}},
             {backtrace,
                 [<<"Program counter: 0x00007fc13b06c648 (gen:do_call/4 + 392)">>,
                  <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
                  <<"0x00007fc0d3d6b340 Return addr 0x00007fc13b147ef0 (gen_server:call/3 + 128)">>,
                  <<"y(0)     #Ref<0.0.6.205680>">>,<<"y(1)     infinity">>,
                  <<"(2)     {call,ns_couchdb_api,handle_rpc,[{set_vbucket_states,\"default\",[active,active,active,active,active,active,active">>,
                  <<"y(3)     '$gen_call'">>,
                  <<"y(4)     {rex,'couchdb_ns_1@127.0.0.1'}">>,<<"y(5)     []">>,
                  <<>>,
                  <<"0x00007fc0d3d6b378 Return addr 0x00007fc13b3ad8e8 (rpc:do_call/3 + 168)">>,
                  <<"y(0)     infinity">>,
                  <<"(1)     {call,ns_couchdb_api,handle_rpc,[{set_vbucket_states,\"default\",[active,active,active,active,active,active,active">>,
                  <<"y(2)     {rex,'couchdb_ns_1@127.0.0.1'}">>,
                  <<"y(3)     Catch 0x00007fc13b147ef0 (gen_server:call/3 + 128)">>,
                  <<>>,
                  <<"0x00007fc0d3d6b3a0 Return addr 0x00007fc135bf8480 (ns_couchdb_api:rpc_couchdb_node/4 + 144)">>,
                  <<"y(0)     Catch 0x00007fc13b3ad8e8 (rpc:do_call/3 + 168)">>,
                  <<>>,
                  <<"0x00007fc0d3d6b3b0 Return addr 0x00007fc0edcea160 (janitor_agent:pass_vbucket_states_to_set_view_manager/1 + 216)">>,
                  <<"y(0)     []">>,
                  <<"(1)     {set_vbucket_states,\"default\",[active,active,active,active,active,active,active,active,active,active,active,acti">>,
                  <<"y(2)     undefined">>,<<>>,
                  <<"0x00007fc0d3d6b3d0 Return addr 0x00007fc0edcec730 (janitor_agent:handle_apply_vbucket_state/2 + 840)">>,
                  <<"(0)     {state,\"default\",<0.21651.11>,#Ref<0.0.6.34535>,[],[active,active,active,active,active,active,active,active,acti">>,
                  <<>>,
                  <<"0x00007fc0d3d6b3e0 Return addr 0x00007fc0edcec300 (janitor_agent:apply_vbucket_states_worker_loop/0 + 168)">>,
                  <<>>,
                  <<"0x00007fc0d3d6b3e8 Return addr 0x00007fc13b070000 (proc_lib:init_p/3 + 688)">>,
                  <<"y(0)     {update_vbucket_state,161,pending,passive,'ns_1@172.23.108.104'}">>,
                  <<"y(1)     <0.9106.0>">>,<<>>,
                  <<"0x00007fc0d3d6b400 Return addr 0x0000000000891848 (<terminate process normally>)">>,
                  <<"y(0)     []">>,
                  <<"y(1)     Catch 0x00007fc13b070020 (proc_lib:init_p/3 + 720)">>,
                  <<"y(2)     []">>,<<>>]},
             {error_handler,error_handler},
             {garbage_collection,
                 [{min_bin_vheap_size,46422},
                  {min_heap_size,233},
                  {fullsweep_after,512},
                  {minor_gcs,30}]},
             {heap_size,17731},
             {total_heap_size,35462},
             {links,[<0.9106.0>,<0.9105.0>]},
             {monitors,[{process,{rex,'couchdb_ns_1@127.0.0.1'}}]},
             {monitored_by,[]},
             {memory,284680},
             {messages,
                 [<<"{<0.9106.0>,{update_vbucket_state,224,replica,undefined,undefined},{state,[100,101,102,97,117,108,116],<0.21651.11>,#Ref<0.0.6.34535>,[],[active,active,active,active,active,active,active,active,active,active,active,active,active,active,active|...],[undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined|...],0,in_process,true,{[{<0.26564.12>,#Ref<0.0.6.205647>}],[]},<0.21656.11>,<0.9105.0>}}">>]},
             {message_queue_len,1},
             {reductions,46641},
             {trap_exit,false},
             {current_location,{gen,do_call,4,[{file,"gen.erl"},{line,211}]}},
             {dictionary,
                 [{'$ancestors',
                      ['janitor_agent-default','janitor_agent_sup-default',
                       'single_bucket_kv_sup-default',ns_bucket_sup,
                       ns_bucket_worker_sup,ns_server_sup,ns_server_nodes_sup,
                       <0.170.0>,ns_server_cluster_sup,<0.89.0>]},
                  {'$initial_call',
                      {janitor_agent,'-set_rebalance_mref/2-fun-0-',0}}]}]}
           {<0.21654.11>,
            [{registered_name,[]},
             {status,waiting},
             {initial_call,{proc_lib,init_p,5}},
             {backtrace,[<<"Program counter: 0x00007fc135dfb280 (ns_pubsub:do_subscribe_link/4 + 392)">>,
                         <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,
                         <<>>,
                         <<"0x00007fc0ed3c82f8 Return addr 0x00007fc13b070210 (proc_lib:init_p_do_apply/3 + 56)">>,
                         <<"y(0)     []">>,
                         <<"y(1)     {ns_pubsub,#Ref<0.0.6.34518>}">>,
                         <<"y(2)     <0.21651.11>">>,
                         <<"y(3)     ns_node_disco_events">>,<<>>,
                         <<"0x00007fc0ed3c8320 Return addr 0x0000000000891848 (<terminate process normally>)">>,
                         <<"y(0)     Catch 0x00007fc13b070230 (proc_lib:init_p_do_apply/3 + 88)">>,
                         <<>>]},
             {error_handler,error_handler},
             {garbage_collection,[{min_bin_vheap_size,46422},
                                  {min_heap_size,233},
                                  {fullsweep_after,512},
                                  {minor_gcs,0}]},
             {heap_size,233},
             {total_heap_size,233},
             {links,[<0.21651.11>,<0.3419.0>]},
             {monitors,[]},
             {monitored_by,[]},
             {memory,2744},
             {messages,[]},
             {message_queue_len,0},
             {reductions,21},
             {trap_exit,true},
             {current_location,{ns_pubsub,do_subscribe_link,4,
                                          [{file,"src/ns_pubsub.erl"},{line,125}]}},
             {dictionary,[{'$ancestors',[<0.21651.11>,<0.28576.0>,<0.4381.0>,
                                         ns_orchestrator_child_sup,
                                         ns_orchestrator_sup,mb_master_sup,
                                         mb_master,<0.3602.0>,ns_server_sup,
                                         ns_server_nodes_sup,<0.170.0>,
                                         ns_server_cluster_sup,<0.89.0>]},
                          {'$initial_call',{ns_pubsub,do_subscribe_link,4}}]}]}
           {<0.21651.11>,
            [{registered_name,[]},
             {status,waiting},
             {initial_call,{proc_lib,init_p,5}},
             {backtrace,[<<"Program counter: 0x00007fc13b149920 (gen_server:loop/6 + 264)">>,
                         <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,
                         <<>>,
                         <<"0x00007fc0ecebd188 Return addr 0x00007fc13b070210 (proc_lib:init_p_do_apply/3 + 56)">>,
                         <<"y(0)     []">>,<<"y(1)     infinity">>,
                         <<"y(2)     ns_vbucket_mover">>,
                         <<"(3)     {state,\"default\",<0.21654.11>,{array,1024,0,undefined,{{{{['ns_1@172.23.108.103','ns_1@172.23.97.238'],['ns_1@17">>,
                         <<"y(4)     <0.21651.11>">>,<<"y(5)     <0.28576.0>">>,
                         <<>>,
                         <<"0x00007fc0ecebd1c0 Return addr 0x0000000000891848 (<terminate process normally>)">>,
                         <<"y(0)     Catch 0x00007fc13b070230 (proc_lib:init_p_do_apply/3 + 88)">>,
                         <<>>]},
             {error_handler,error_handler},
             {garbage_collection,[{min_bin_vheap_size,46422},
                                  {min_heap_size,233},
                                  {fullsweep_after,512},
                                  {minor_gcs,59}]},
             {heap_size,6772},
             {total_heap_size,53194},
             {links,[<0.21654.11>,<0.26416.12>,<0.26546.12>,<0.26402.12>,
                     <0.28576.0>,<0.234.0>]},
             {monitors,[]},
             {monitored_by,[<19992.8268.0>,<19994.8106.0>,<19991.8494.0>,
                            <19993.8224.0>,<19993.4935.0>,<0.9106.0>,<19990.8331.0>,
                            <19990.4595.0>,<19989.8463.0>,<19989.4541.0>,<0.3596.0>,
                            <0.28576.0>]},
             {memory,427888},
             {messages,[]},
             {message_queue_len,0},
             {reductions,9642128},
             {trap_exit,true},
             {current_location,{gen_server,loop,6,
                                           [{file,"gen_server.erl"},{line,358}]}},
      

      Attaching logs

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              arunkumar Arunkumar Senthilnathan (Inactive)
              arunkumar Arunkumar Senthilnathan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty