Details
-
Bug
-
Resolution: Fixed
-
Critical
-
3.0
-
Security Level: Public
-
None
-
3.0.0-848
-
Untriaged
-
Centos 64-bit
-
Unknown
Description
system test info before rebalance:
4 buckets:
AbRegNum: 500MB ram quota, ~ 20 resident ratio
RevAB : 4500MB ram quota, ~70 resident ratio
MsgsCalls: 300MB ram quota, ~70 resident ratio
UserINno: 300MB ram quota, ~100 resident ratio
3 nodes in the cluster:
172.23.105.22, 172.23.105.157, 172.23.105.158
UniXDCR replication with other cluster: 172.23.105.159
Starting rebalance, KeepNodes = ['ns_1@172.23.105.22','ns_1@172.23.105.157',
'ns_1@172.23.105.158','ns_1@172.23.105.156',
'ns_1@172.23.105.160'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
rebalance stuck for a long time with progress ~1% then failed with wait_seqno_persisted
don't see any crashes on vms
Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.13953.427>,
{wait_seqno_persisted_failed,"RevAB",849,
17733,
[{'ns_1@172.23.105.157',
{'EXIT',
badmatch,{error,closed,
{gen_server,call,
[
{if_rebalance,<0.9603.440>,
{wait_seqno_persisted,849,17733}},
infinity]}}}}]}}}
ns_orchestrator002 ns_1@172.23.105.158 12:07:10 - Sat Jun 21, 2014
<0.9744.443> exited with {unexpected_exit,
{'EXIT',<0.13953.427>,
{wait_seqno_persisted_failed,"RevAB",849,17733,
[{'ns_1@172.23.105.157',
{'EXIT',
badmatch,{error,closed,
{gen_server,call,
[{'janitor_agent-RevAB', 'ns_1@172.23.105.157'}
,
{if_rebalance,<0.9603.440>,
{wait_seqno_persisted,849,17733}},
infinity]}}}}]}}} ns_vbucket_mover000 ns_1@172.23.105.158 12:07:10 - Sat Jun 21, 2014
Bucket "AbRegNums" loaded on node 'ns_1@172.23.105.157' in 27 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:58 - Sat Jun 21, 2014
Bucket "MsgsCalls" loaded on node 'ns_1@172.23.105.157' in 3 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:35 - Sat Jun 21, 2014
Bucket "UserInfo" loaded on node 'ns_1@172.23.105.157' in 28 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:31 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
stats_recv,
4,
[
{mc_client_binary,
stats,
4,
[{file, "src/mc_client_binary.erl"}
,
{line, 411}]},{ns_memcached,
handle_info,
2,
[{file, "src/ns_memcached.erl"}, {line, 725}]},
{gen_server,
handle_msg,
5,
[{file, "gen_server.erl"}, {line, 604}]},
{ns_memcached,
init,
1,
[{file, "src/ns_memcached.erl"}, {line, 170}]},
{gen_server,
init_it,
6,
[{file, "gen_server.erl"}, {line, 304}]},
{proc_lib,
init_p_do_apply,
3,
[{file, "proc_lib.erl"}, {line, 239}]}]} (repeated 2 times) ns_memcached000 ns_1@172.23.105.157 12:06:13 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file, "src/mc_client_binary.erl"}, {line, 149}]},
{mc_client_binary,
select_bucket,
2,
[{file, "src/mc_client_binary.erl"}, {line, 344}]},
{ns_memcached,
ensure_bucket,
2,
[{file, "src/ns_memcached.erl"}, {line, 1280}]},
{ns_memcached,
handle_info,
2,
[{file, "src/ns_memcached.erl"}, {line, 750}]},
{gen_server,
handle_msg,
5,
[{file, "gen_server.erl"}, {line, 604}]},
{ns_memcached,
init,
1,
[{file, "src/ns_memcached.erl"},{line, 170}]},
{gen_server,
init_it,
6,
[{file, "gen_server.erl"}, {line, 304}]},
{proc_lib,
init_p_do_apply,
3,
[{file, "proc_lib.erl"}, {line, 239}]}]} (repeated 3 times) ns_memcached000 ns_1@172.23.105.157 12:06:13 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {badmatch,
{error,
closed}} ns_memcached000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
stats_recv,
4,
[{file, "src/mc_client_binary.erl"}, {line, 163}]},
{mc_client_binary,
stats,
4,
[{file, "src/mc_client_binary.erl"},{line, 411}
]},
{ns_memcached,
handle_info,
2,
[
{gen_server,
handle_msg,
5,
[{file, "gen_server.erl"}, {line, 604}]},
{ns_memcached,
init,
1,
[{file, "src/ns_memcached.erl"}
,
{line, 170}]},{gen_server,
init_it,
6,
[{file, "gen_server.erl"}, {line, 304}]},
{proc_lib,
init_p_do_apply,
3,
[{file, "proc_lib.erl"}, {line, 239}]}]} ns_memcached000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 139. Restarting. Messages: Sat Jun 21 12:05:50.696179 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 363) Stream closing, 0 items sent from disk, 0 items sent from memory, 894 was last seqno sent
Sat Jun 21 12:05:50.696196 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 363) stream created with start seqno 894 and end seqno 894
Sat Jun 21 12:05:50.698921 PDT 3: (AbRegNums) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:AbRegNums - (vb 363) stream created with start seqno 894 and end seqno 0
Sat Jun 21 12:05:50.699524 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 424) Stream closing, 0 items sent from disk, 0 items sent from memory, 920 was last seqno sent
Sat Jun 21 12:05:50.699544 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 424) stream created with start seqno 920 and end seqno 920 ns_log000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
Bucket "AbRegNums" loaded on node 'ns_1@172.23.105.157' in 36 seconds. ns_memcached000 ns_1@172.23.105.157 12:05:46 - Sat Jun 21, 2014
Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 139. Restarting. Messages: Sat Jun 21 12:04:53.361372 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 162) stream created with start seqno 17580 and end seqno 0
Sat Jun 21 12:04:53.371897 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 381) stream created with start seqno 17966 and end seqno 0
Sat Jun 21 12:04:53.401541 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 98) stream created with start seqno 17760 and end seqno 0
Sat Jun 21 12:04:53.454580 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 166) stream created with start seqno 17704 and end seqno 0
Sat Jun 21 12:04:53.743529 PDT 3: (RevAB) Notified the timeout on checkpoint persistence for vbucket 921, cookie 0x663d500 ns_log000 ns_1@172.23.105.157 12:05:05 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file, "src/mc_client_binary.erl"}, {line, 149}]},
{mc_client_binary,
select_bucket,
2,
[{file, "src/mc_client_binary.erl"}, {line, 344}]},
{ns_memcached,
ensure_bucket,
2,
[{file, "src/ns_memcached.erl"}, {line, 1280}]},
{ns_memcached,
handle_info,
2,
[{file, "src/ns_memcached.erl"}, {line, 750}]},
{gen_server,
handle_msg,
5,
[{file, "gen_server.erl"}, {line, 604}]},
{ns_memcached,
init,
1,
[{file, "src/ns_memcached.erl"},{line, 170}
]},
{gen_server,
init_it,
6,
[
,
{line, 304}]},
{proc_lib,
init_p_do_apply,
3,
[
,
{line, 239}]}]} ns_memcached000 ns_1@172.23.105.157 12:05:05 - Sat Jun 21, 2014
Bucket "RevAB" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.158 11:38:27 - Sat Jun 21, 2014
Started rebalancing bucket RevAB ns_rebalancer000 ns_1@172.23.105.158 11:38:24 - Sat Jun 21, 2014
Starting rebalance, KeepNodes = ['ns_1@172.23.105.22','ns_1@172.23.105.157',
'ns_1@172.23.105.158','ns_1@172.23.105.156',
'ns_1@172.23.105.160'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
ns_orchestrator004 ns_1@172.23.105.158 11:38:23 - Sat Jun 21, 2014