Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.6.0, 4.6.1, 4.6.4, 4.6.2, 4.6.3, 5.0.0
-
Untriaged
-
Release Note
-
-
Yes
Description
Issue occurred 5 days into longevity test with ephemeral buckets having no eviction policy.
Logs show rebalance started, then we got some metadata overhead warnings followed by ns_server backtrace
2017-09-13T07:34:51.604-07:00, ns_orchestrator:4:info:message(ns_1@172.23.106.14) - Starting rebalance, KeepNodes = ['ns_1@172.23.105.60','ns_1@172.23.105.61', |
'ns_1@172.23.105.62','ns_1@172.23.105.63', |
'ns_1@172.23.106.14','ns_1@172.23.106.213', |
'ns_1@172.23.106.96','ns_1@172.23.99.168', |
'ns_1@172.23.99.253'], EjectNodes = ['ns_1@172.23.105.83'], Failed over and being ejected nodes = []; no delta recovery nodes |
2017-09-13T07:40:32.197-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.106.14) - Bucket "default" rebalance appears to be swap rebalance |
2017-09-13T08:02:01.695-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.99.253) - Metadata overhead warning. Over 50% of RAM allocated to bucket "default" on node "172.23.99.253" is taken up by keys and metadata. |
2017-09-13T08:02:22.551-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.99.253) - Metadata overhead warning. Over 50% of RAM allocated to bucket "default" on node "172.23.99.253" is taken up by keys and metadata. (repeated 6 times) |
|
per_node_processes('ns_1@172.23.106.14') = |
{<0.32569.4081>, |
[{registered_name,[]},
|
{status,waiting},
|
{initial_call,{proc_lib,init_p,5}}, |
{backtrace,
|
[<<"Program counter: 0x00007f460af7b288 (ns_single_vbucket_mover:spawn_and_wait/1 + 72)">>, |
<<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>, |
<<"0x00007f4609bdd678 Return addr 0x00007f46533eee90 (misc:try_with_maybe_ignorant_after/2 + 80)">>, |
<<"y(0) []">>,<<"y(1) []">>,<<"y(2) <0.20357.4080>">>, |
<<>>,
|
<<"0x00007f4609bdd698 Return addr 0x00007f460af7b0d8 (ns_single_vbucket_mover:mover/5 + 896)">>, |
<<"y(0) []">>,<<"y(1) []">>,<<"y(2) []">>, |
<<"y(3) []">>, |
<<"y(4) #Fun<ns_single_vbucket_mover.3.48828051>">>, |
<<"y(5) Catch 0x00007f46533eeeb0 (misc:try_with_maybe_ignorant_after/2 + 112)">>, |
<<>>,
|
<<"0x00007f4609bdd6d0 Return addr 0x00007f465befc198 (proc_lib:init_p_do_apply/3 + 56)">>, |
<<"y(0) []">>,<<"y(1) true">>, |
<<"y(2) ['ns_1@172.23.105.62','ns_1@172.23.106.213']">>, |
<<"y(3) ['ns_1@172.23.105.62','ns_1@172.23.105.83']">>, |
<<"y(4) 27">>,<<"y(5) <0.25037.4080>">>,<<>>, |
<<"0x00007f4609bdd708 Return addr 0x0000000000893588 (<terminate process normally>)">>, |
<<"y(0) Catch 0x00007f465befc1b8 (proc_lib:init_p_do_apply/3 + 88)">>, |
<<>>]},
|
|
Result is that rebalance is hanging in the cluster.