Details
-
Bug
-
Resolution: Fixed
-
Test Blocker
-
5.5.0
-
Untriaged
-
No
Description
centos longevity - 5.5.0-2907 - looks like rebalance has been stuck for more than 2 days - following trace seen in .103 diag.log:
2018-06-16T22:29:14.189-07:00, ns_orchestrator:4:info:message(ns_1@172.23.108.103) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.164','ns_1@172.23.104.61',
|
'ns_1@172.23.106.188','ns_1@172.23.108.103',
|
'ns_1@172.23.108.104','ns_1@172.23.96.145',
|
'ns_1@172.23.96.148','ns_1@172.23.96.168',
|
'ns_1@172.23.96.56','ns_1@172.23.97.238',
|
'ns_1@172.23.97.239','ns_1@172.23.97.242',
|
'ns_1@172.23.98.135','ns_1@172.23.99.11',
|
'ns_1@172.23.99.20','ns_1@172.23.99.21',
|
'ns_1@172.23.99.25'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
|
|
2018-06-16T22:29:16.227-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket WAREHOUSE
|
2018-06-16T22:29:16.706-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "WAREHOUSE" rebalance appears to be swap rebalance
|
2018-06-16T22:29:17.345-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket STOCK
|
2018-06-16T22:29:18.357-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "STOCK" rebalance appears to be swap rebalance
|
2018-06-16T22:29:18.827-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket ORDER_LINE
|
2018-06-16T22:29:19.803-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "ORDER_LINE" rebalance appears to be swap rebalance
|
2018-06-16T22:29:20.208-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket ORDERS
|
2018-06-16T22:29:20.881-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "ORDERS" rebalance appears to be swap rebalance
|
2018-06-16T22:29:21.510-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket NEW_ORDER
|
2018-06-16T22:29:22.426-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "NEW_ORDER" rebalance appears to be swap rebalance
|
2018-06-16T22:29:22.598-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket ITEM
|
2018-06-16T22:29:23.222-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "ITEM" rebalance appears to be swap rebalance
|
2018-06-16T22:29:23.698-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket HISTORY
|
2018-06-16T22:29:24.618-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "HISTORY" rebalance appears to be swap rebalance
|
2018-06-16T22:29:24.952-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket DISTRICT
|
2018-06-16T22:29:25.688-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "DISTRICT" rebalance appears to be swap rebalance
|
2018-06-16T22:29:25.951-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket CUSTOMER
|
2018-06-16T22:29:26.506-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "CUSTOMER" rebalance appears to be swap rebalance
|
2018-06-16T22:29:26.726-07:00, ns_rebalancer:0:info:message(ns_1@172.23.108.103) - Started rebalancing bucket default
|
2018-06-16T22:29:27.364-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.108.103) - Bucket "default" rebalance appears to be swap rebalance
|
2018-06-16T22:38:28.407-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.104.61'). There was at least another node down.
|
2018-06-16T22:38:28.411-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.108.103'). There was at least another node down.
|
2018-06-16T22:38:28.465-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.108.104'). There was at least another node down.
|
2018-06-16T22:38:28.466-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.96.145'). There was at least another node down.
|
2018-06-16T22:38:28.467-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.96.168'). There was at least another node down.
|
2018-06-16T22:38:28.468-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.97.238'). There was at least another node down.
|
2018-06-16T22:38:28.525-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.97.239'). There was at least another node down.
|
2018-06-16T22:38:28.564-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.99.20'). There was at least another node down.
|
2018-06-16T22:38:28.616-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.99.21'). There was at least another node down.
|
2018-06-16T22:38:28.620-07:00, auto_failover:3:info:message(ns_1@172.23.108.103) - Could not auto-failover node ('ns_1@172.23.99.25'). There was at least another node down.
|
2018-06-17T23:12:27.315-07:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.108.103) - Client-side error-report for user "Administrator" on node 'ns_1@172.23.108.103':
|
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15
|
Got unhandled javascript error:
|
message: The transition errored;
|
|
|
2018-06-17T23:13:02.974-07:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.108.103) - Client-side error-report for user "Administrator" on node 'ns_1@172.23.108.103':
|
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15
|
Got unhandled javascript error:
|
message: The transition errored;
|
|
(repeated 1 times)
|
2018-06-18T01:39:40.575-07:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.108.103) - Client-side error-report for user "Administrator" on node 'ns_1@172.23.108.103':
|
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15
|
Got unhandled javascript error:
|
message: The transition errored;
|
|
|
-------------------------------
|
|
|
per_node_processes('ns_1@172.23.108.103') =
|
{<0.32163.965>,
|
[{registered_name,[]},
|
{status,waiting},
|
{initial_call,{proc_lib,init_p,5}},
|
{backtrace,[<<"Program counter: 0x00007fa09d8c9920 (gen_server:loop/6 + 264)">>,
|
<<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,
|
<<>>,
|
<<"0x00007fa098141618 Return addr 0x00007fa09d7f0210 (proc_lib:init_p_do_apply/3 + 56)">>,
|
<<"y(0) []">>,<<"y(1) 10000">>,
|
<<"y(2) dcp_proxy">>,
|
<<"(3) {state,#Port<0.736989>,{consumer,\"replication:ns_1@172.23.108.104->ns_1@172.23.108.103:CUSTOMER\",'ns_1@172.23.10">>,
|
<<"y(4) <0.32163.965>">>,<<"y(5) <0.5515.966>">>,
|
<<>>,
|
<<"0x00007fa098141650 Return addr 0x0000000000892548 (<terminate process normally>)">>,
|
<<"y(0) Catch 0x00007fa09d7f0230 (proc_lib:init_p_do_apply/3 + 88)">>,
|
<<>>]},
|
{error_handler,error_handler},
|
{garbage_collection,[{min_bin_vheap_size,46422},
|
{min_heap_size,233},
|
{fullsweep_after,512},
|
{minor_gcs,21}]},
|
{heap_size,987},
|
{total_heap_size,1974},
|
{links,[<0.5515.966>,#Port<0.736989>]},
|
{monitors,[]},
|
{monitored_by,[<0.8120.0>]},
|
{memory,16744},
|
{messages,[]},
|
{message_queue_len,0},
|
{reductions,14978131},
|
{trap_exit,false},
|
{current_location,{gen_server,loop,6,
|
[{file,"gen_server.erl"},{line,358}]}},
|
{dictionary,[{'$ancestors',['dcp_replicator-CUSTOMER-ns_1@172.23.108.104',
|
'dcp_sup-CUSTOMER',
|
'single_bucket_kv_sup-CUSTOMER',
|
ns_bucket_sup,ns_bucket_worker_sup,
|
ns_server_sup,ns_server_nodes_sup,
|
<0.167.0>,ns_server_cluster_sup,<0.89.0>]},
|
{'$initial_call',{dcp_proxy,init,1}}]}]}
|
{<0.31048.483>,
|
[{registered_name,[]},
|
{status,waiting},
|
{initial_call,{erlang,apply,2}},
|
{backtrace,
|
[<<"Program counter: 0x00007fa047b8b810 (leader_lease_acquire_worker:loop/1 + 40)">>,
|
<<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
|
<<"0x00007fa062af19b0 Return addr 0x00007fa063eec930 (async:'-async_init/4-fun-2-'/3 + 272)">>,
|
<<"(0) {state,<0.9200.0>,'ns_1@172.23.96.145',<<32 bytes>>,true,1529369273548,1529369283548,{backoff,500,15000,2,500},{">>,
|
<<>>,
|
<<"0x00007fa062af19c0 Return addr 0x0000000000892548 (<terminate process normally>)">>,
|
<<"y(0) []">>,<<"y(1) []">>,
|
<<"y(2) Catch 0x00007fa063eec988 (async:'-async_init/4-fun-2-'/3 + 360)">>,
|
<<"y(3) {<0.28987.483>,#Ref<0.0.316.55163>}">>,<<>>]},
|
{error_handler,error_handler},
|
{garbage_collection,
|
[{min_bin_vheap_size,46422},
|
{min_heap_size,233},
|
{fullsweep_after,512},
|
{minor_gcs,38}]},
|
{heap_size,1598},
|
{total_heap_size,1974},
|
{links,[<0.28987.483>]},
|
{monitors,[]},
|
{monitored_by,[]},
|
{memory,16632},
|
{messages,[]},
|
{message_queue_len,0},
|
{reductions,22489112},
|
{trap_exit,false},
|
{current_location,
|
{leader_lease_acquire_worker,loop,1,
|
[{file,"src/leader_lease_acquire_worker.erl"},{line,61}]}},
|
{dictionary,
|
[{'$async_role',executor},{'$async_controller',<0.28987.483>}]}]}
|
{<0.30924.483>,
|
[{registered_name,[]},
|
{status,waiting},
|
{initial_call,{inet_tcp_dist,do_accept,6}},
|
{backtrace,[<<"Program counter: 0x00007fa063d6b6c0 (dist_util:con_loop/9 + 112)">>,
|
<<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,
|
<<>>,
|
<<"0x00007fa061807ce0 Return addr 0x0000000000892548 (<terminate process normally>)">>,
|
<<"y(0) []">>,
|
<<"y(1) #Fun<inet_tcp_dist.getstat.1>">>,
|
<<"y(2) #Fun<inet_tcp_dist.tick.1>">>,
|
<<"y(3) {tick,1268894,2159172,2,2}">>,
|
<<"y(4) normal">>,<<"y(5) 'ns_1@172.23.108.103'">>,
|
<<"y(6) {net_address,{{172,23,96,145},59255},\"172.23.96.145\",tcp,inet}">>,
|
<<"y(7) #Port<0.359168>">>,
|
<<"y(8) 'ns_1@172.23.96.145'">>,
|
<<"y(9) <0.9129.0>">>,<<>>]},
|
{error_handler,error_handler},
|
{garbage_collection,[{min_bin_vheap_size,46422},
|
{min_heap_size,233},
|
{fullsweep_after,512},
|
{minor_gcs,188}]},
|
{heap_size,987},
|
{total_heap_size,1363},
|
{links,[<0.9129.0>,#Port<0.359168>]},
|
{monitors,[]},
|
{monitored_by,[]},
|
{memory,11680},
|
{messages,[]},
|
{message_queue_len,0},
|
{reductions,605395},
|
{trap_exit,false},
|
{current_location,{dist_util,con_loop,9,
|
[{file,"dist_util.erl"},{line,454}]}},
|
{dictionary,[]}]}
|
Supportal: https://supportal.couchbase.com/snapshot/d728fa2425d291c708831355596cf470::0
cluster live at 172.23.108.103:8091 for debugging
Attachments
Issue Links
- relates to
-
MB-30288 [Backport MB-30162] - rebalance in for index node stuck for 2 days - centos longevity
- Closed