Details
Description
steps:
1. 3 nodes in cluster, 4 buckets. run data loader more then a day
2. setup replication from SRC to cluster DEST for all buckets.
3. rebalance in at SRC cluster
rebalance in at DEST cluster
4. Graceful Fail Over(rebalance) for node in SRC cluster, add back(Delta Recovery)
5. click failover, Hard Fail Over for node in SRC cluster A, add back(Full Recovery) and rebalance
6. remove node in SRC cluster, stop rebalance. Cancel removing node and rebalance
7. rebalance out 1 node on SRC cluster
8. rebalance out 1 node on DEST cluster
9. rebalance in 2 nodes on SRC cluster
result:
cat info.log | grep -A 30 -B 30 "exited with reason"
{stats_collector,handle_info,2,
[
{line,125}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,604}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}
[stats:error,2015-05-14T9:56:17.628,ns_1@172.23.105.156:<0.4775.0>:stats_collector:handle_info:133]Exception in stats collector: {exit,
{{error,closed},
{gen_server,call,
['ns_memcached-RevAB',
{stats,<<>>},
180000]}},
[{gen_server,call,3,
[{file,"gen_server.erl"},{line,188}]},
{ns_memcached,do_call,3,
[{file,"src/ns_memcached.erl"},
{line,1425}]},
{stats_collector,grab_all_stats,1,
[{file,"src/stats_collector.erl"}
,
]},
{stats_collector,handle_info,2,
[
,
]},
{gen_server,handle_msg,5,
[
,
{line,604}]},
{proc_lib,init_p_do_apply,3,
[
,
{line,239}]}]}
[user:info,2015-05-14T9:56:17.628,ns_1@172.23.105.156:<0.1466.0>:ns_orchestrator:handle_info:482]Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.16693.67>,
{wait_seqno_persisted_failed,"RevAB",943,
25440,
[{'ns_1@172.23.105.156',
{'EXIT',
badmatch,{error,closed,
{gen_server,call,
[
,
{if_rebalance,<0.9143.66>,
{wait_seqno_persisted,943,25440}},
infinity]}}}}]}}}
[ns_server:warn,2015-05-14T9:56:17.630,ns_1@172.23.105.156:<0.16930.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,
{error,econnrefused}}}, retrying.[ns_server:info,2015-05-14T9:56:17.632,ns_1@172.23.105.156:<0.17531.67>:compaction_new_daemon:spawn_scheduled_kv_compactor:467]Start compaction of vbuckets for bucket RevAB with config:
[{database_fragmentation_threshold,{30,undefined}},
{view_fragmentation_threshold,{30,undefined}}]
[ns_server:warn,2015-05-14T9:56:17.641,ns_1@172.23.105.156:<0.17088.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,{error,econnrefused}
}}, retrying.
[ns_server:warn,2015-05-14T9:56:17.678,ns_1@172.23.105.156:<0.17529.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,
[ns_server:warn,2015-05-14T9:56:17.678,ns_1@172.23.105.156:<0.17514.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,{error,econnrefused}
}}, retrying.
[ns_server:info,2015-05-14T9:56:17.732,ns_1@172.23.105.156:<0.17554.67>:diag_handler:log_all_tap_and_checkpoint_stats:130]logging tap & checkpoint stats
[user:info,2015-05-14T9:56:17.790,ns_1@172.23.105.156:<0.1500.0>:ns_log:crash_consumption_loop:70]Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 137. Restarting. Messages: 2015-05-14T09:55:06.527524-07:00 WARNING (AbRegNums) Backfill task (1 to 1154) cancelled for vb 769
2015-05-14T09:55:06.527555-07:00 WARNING (AbRegNums) Backfill task (983 to 1104) cancelled for vb 451
2015-05-14T09:55:06.527585-07:00 WARNING (AbRegNums) Backfill task (992 to 1092) cancelled for vb 468
2015-05-14T09:55:06.527615-07:00 WARNING (AbRegNums) Backfill task (1059 to 1158) cancelled for vb 482
2015-05-14T09:55:06.527643-07:00 WARNING (AbRegNums) Backfill task (1 to 1109) cancelled for vb 831
[stats:warn,2015-05-14T9:56:18.704,ns_1@172.23.105.156:<0.4855.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks
[stats:warn,2015-05-14T9:56:18.705,ns_1@172.23.105.156:<0.4811.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks
[stats:warn,2015-05-14T9:56:18.705,ns_1@172.23.105.156:<0.4781.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks
[stats:warn,2015-05-14T9:56:18.706,ns_1@172.23.105.156:<0.5298.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks
please note that MB-14983 created for the same run
will provide collect info soon