Description
2.0.0-1700-rel
http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions/31/consoleFull
/testrunner -i /tmp/rebalance_regression.ini get-logs=True -t swaprebalance.SwapRebalanceFailedTests.test_failed_swap_rebalance,replica=2,num-buckets=1,num-swap=2,swap-orchestrator=Fasle
Alk, I do not see any erlang crash/core dumps, there are only logs:
rebalance:info,2012-09-08T10:37:20.506,ns_1@10.3.121.92:<0.22478.1>:janitor_agent:set_vbucket_state:407]Doing vbucket 164 state change:
{'ns_1@10.3.121.93',replica,undefined, 'ns_1@10.3.121.95'}[rebalance:info,2012-09-08T10:37:20.506,ns_1@10.3.121.92:<0.22479.1>:janitor_agent:set_vbucket_state:407]Doing vbucket 164 state change:
{'ns_1@10.3.121.96',replica,undefined, 'ns_1@10.3.121.93'}[error_logger:error,2012-09-08T10:37:20.513,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_msg:76]** Generic server <19987.20007.0> terminating
-
- Last message in was {tcp_closed,#Port<19987.25135>}
- When Server state == {state,#Port<19987.25134>,#Port<19987.25132>,
#Port<19987.25135>,#Port<19987.25133>,
<19987.20008.0>,<<>>,<<>>,
{set,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]},
{{[],
[561],
[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
115,false,false,0,
{1347,125829,535727},
not_started,undefined,
<<"replication_building_561_'ns_1@10.3.121.96'">>,
<19987.20007.0>,
{had_backfill,true,undefined,[]}}
** Reason for termination ==
** downstream_closed
[error_logger:error,2012-09-08T10:37:20.515,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_msg:76]** Generic server <19987.19993.0> terminating
** Last message in was {tcp_closed,#Port<19987.25127>}
** When Server state == {state,#Port<19987.25126>,#Port<19987.25117>,
#Port<19987.25127>,#Port<19987.25118>,
<19987.20006.0>,<<>>,<<>>,
{set,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]},
{1347,125829,399439}
{{[],[],[],[],[],[],[],[],[],[],[],[],
[857],
[],[],[]}}},
203,false,false,0,
,
not_started,undefined,
<<"replication_building_857_'ns_1@10.3.121.96'">>,
<19987.19993.0>,
Unknown macro: {had_backfill,true,undefined,[]}}
- Reason for termination ==
- downstream_closed
[ns_server:info,2012-09-08T10:37:20.517,ns_1@10.3.121.92:<0.22444.1>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.94': [<<"replication_building_857_'ns_1@10.3.121.96'">>,
<<"replication_building_857_'ns_1@10.3.121.93'">>,
<<"replication_building_857_'ns_1@10.3.121.95'">>]
[ns_server:info,2012-09-08T10:37:20.518,ns_1@10.3.121.92:<0.22460.1>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.93': [<<"replication_building_561_'ns_1@10.3.121.96'">>,
<<"replication_building_561_'ns_1@10.3.121.95'">>]
[error_logger:error,2012-09-08T10:37:20.518,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ebucketmigrator_srv:init/1
pid: <19987.20007.0>
registered_name: []
exception exit: downstream_closed
in function gen_server:terminate/6
ancestors: [<0.22460.1>,<0.22455.1>,<0.11756.1>,<0.11717.1>]
messages: []
links: [#Port<19987.25133>,<0.22460.1>,<19987.20008.0>,
#Port<19987.25132>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 128921
neighbours:
[ns_server:info,2012-09-08T10:37:20.518,ns_1@10.3.121.92:<0.22423.1>:ns_single_vbucket_mover:mover_inner_old_style:199]Got exit message (parent is <0.11756.1>). Exiting...
{'EXIT',<0.22444.1>,{missing_checkpoint_stat,'ns_1@10.3.121.96',857}}
[ns_server:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:<0.22423.1>:misc:sync_shutdown_many_i_am_trapping_exits:1373]Shutdown of the following failed: [{<0.22444.1>,
{missing_checkpoint_stat,
'ns_1@10.3.121.96',857}}]
[ns_server:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:<0.22460.1>:misc:try_with_maybe_ignorant_after:1409]Eating exception from ignorant after-block:
{error,{badmatch,[
]},
[
{misc,try_with_maybe_ignorant_after,2},
{misc,try_with_maybe_ignorant_after,2},
{ns_replicas_builder,build_replicas_main,6},
{proc_lib,init_p_do_apply,3}]}
[ns_server:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:<0.22423.1>:misc:try_with_maybe_ignorant_after:1409]Eating exception from ignorant after-block:
{error,{badmatch,[{<0.22444.1>,
{missing_checkpoint_stat,'ns_1@10.3.121.96',857}}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1}
,
{ns_single_vbucket_mover,mover,6},
{proc_lib,init_p_do_apply,3}]}
[error_logger:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ebucketmigrator_srv:init/1
pid: <19987.19993.0>
registered_name: []
exception exit: downstream_closed
in function gen_server:terminate/6
ancestors: [<0.22444.1>,<0.22423.1>,<0.11756.1>,<0.11717.1>]
messages: []
links: [#Port<19987.25118>,<0.22444.1>,<19987.20006.0>,
#Port<19987.25117>]
dictionary: []
trap_exit: true
status: running
heap_size: 2584
stack_size: 24
reductions: 66423
neighbours:
[ns_server:info,2012-09-08T10:37:20.520,ns_1@10.3.121.92:<0.22455.1>:ns_single_vbucket_mover:mover_inner_old_style:199]Got exit message (parent is <0.11756.1>). Exiting...
{'EXIT',<0.22460.1>,
{replicator_died,{'EXIT',<19987.20007.0>,downstream_closed}}}
[ns_server:error,2012-09-08T10:37:20.520,ns_1@10.3.121.92:<0.22455.1>:misc:sync_shutdown_many_i_am_trapping_exits:1373]Shutdown of the following failed: [{<0.22460.1>,
{replicator_died,
{'EXIT',<19987.20007.0>, downstream_closed}}}]
[error_logger:error,2012-09-08T10:37:20.521,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: erlang:apply/2
pid: <0.22444.1>
registered_name: []
exception exit: {missing_checkpoint_stat,'ns_1@10.3.121.96',857}
in function ns_replicas_builder:'
in call from ns_replicas_builder:wait_checkpoint_opened/5
in call from ns_replicas_builder:'
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_replicas_builder:build_replicas_main/6
ancestors: [<0.22423.1>,<0.11756.1>,<0.11717.1>]
messages: [{'EXIT',<19987.19993.0>,downstream_closed}]
links: [<0.22423.1>]
dictionary: []
trap_exit: true
status: running
heap_size: 514229
stack_size: 24
reductions: 57007
neighbours:
[ns_server:error,2012-09-08T10:37:20.521,ns_1@10.3.121.92:<0.22455.1>:misc:try_with_maybe_ignorant_after:1409]Eating exception from ignorant after-block:
{error,
{badmatch,
[{<0.22460.1>,
{replicator_died,{'EXIT',<19987.20007.0>,downstream_closed}}}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2}
,
,
[error_logger:error,2012-09-08T10:37:20.523,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: erlang:apply/2
pid: <0.22460.1>
registered_name: []
exception exit: {replicator_died,
{'EXIT',<19987.20007.0>,downstream_closed}}
in function ns_replicas_builder:'
in call from ns_replicas_builder:observe_wait_all_done_old_style_loop/5
in call from ns_replicas_builder:observe_wait_all_done/5
in call from ns_replicas_builder:'
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_replicas_builder:build_replicas_main/6
ancestors: [<0.22455.1>,<0.11756.1>,<0.11717.1>]
messages: []
links: [<0.22455.1>]
dictionary: []
trap_exit: true
status: running
heap_size: 196418
stack_size: 24
reductions: 47356
neighbours:
[error_logger:error,2012-09-08T10:37:20.524,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.22423.1>
registered_name: []
exception exit: {exited,
{'EXIT',<0.22444.1>,
{missing_checkpoint_stat,'ns_1@10.3.121.96',857}}}
in function ns_single_vbucket_mover:mover_inner_old_style/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.11756.1>,<0.11717.1>]
messages: []
links: [<0.11756.1>]
dictionary: [{cleanup_list,[<0.22444.1>]}]
trap_exit: true
status: running
heap_size: 2584
stack_size: 24
reductions: 5292
neighbours:
[error_logger:error,2012-09-08T10:37:20.525,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.22455.1>
registered_name: []
exception exit: {exited,
{'EXIT',<0.22460.1>,
{replicator_died,
{'EXIT',<19987.20007.0>,downstream_closed}}}}
in function ns_single_vbucket_mover:mover_inner_old_style/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.11756.1>,<0.11717.1>]
messages: []
links: [<0.11756.1>]
dictionary: [{cleanup_list,[<0.22460.1>]}]
trap_exit: true
status: running
heap_size: 2584
stack_size: 24
reductions: 5342
neighbours:
[ns_server:info,2012-09-08T10:37:20.690,ns_1@10.3.121.92:ns_port_memcached:ns_port_server:log:169]memcached<0.403.0>: Sat Sep 8 17:37:20.486667 3: Schedule cleanup of "eq_tapq:rebalance_164"
memcached<0.403.0>: Sat Sep 8 17:37:20.486798 3: TAP (Producer) eq_tapq:rebalance_164 - Clear the tap queues by force
memcached<0.403.0>: Sat Sep 8 17:37:20.501995 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.95' - Connection is closed by force.
memcached<0.403.0>: Sat Sep 8 17:37:20.502220 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.93' - Connection is closed by force.
memcached<0.403.0>: Sat Sep 8 17:37:20.502404 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.96' - Connection is closed by force.
[ns_server:info,2012-09-08T10:37:21.688,ns_1@10.3.121.92:ns_port_memcached:ns_port_server:log:169]memcached<0.403.0>: Sat Sep 8 17:37:21.487627 3: Schedule cleanup of "eq_tapq:anon_2546"
memcached<0.403.0>: Sat Sep 8 17:37:21.487709 3: Schedule cleanup of "eq_tapq:anon_2545"
memcached<0.403.0>: Sat Sep 8 17:37:21.487730 3: Schedule cleanup of "eq_tapq:anon_2547"
memcached<0.403.0>: Sat Sep 8 17:37:21.487863 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.93' - Clear the tap queues by force
memcached<0.403.0>: Sat Sep 8 17:37:21.487914 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.95' - Clear the tap queues by force
memcached<0.403.0>: Sat Sep 8 17:37:21.487964 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.96' - Clear the tap queues by force
[ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change: {node,'ns_1@10.3.121.96',membership} ->
active
[ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change: {node,'ns_1@10.3.121.95',membership} ->
active
[ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change: {node,'ns_1@10.3.121.93',membership} ->
active
[ns_server:debug,2012-09-08T10:37:24.732,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:24.733,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:24.733,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:24.734,ns_1@10.3.121.92:ns_config_rep:ns_config_rep:do_push_keys:317]Replicating some config keys ([{node,'ns_1@10.3.121.93',membership},
{node,'ns_1@10.3.121.95',membership},
{node,'ns_1@10.3.121.96',membership}]..)
[user:info,2012-09-08T10:37:24.736,ns_1@10.3.121.92:<0.3725.0>:ns_orchestrator:rebalancing:513]Not rebalancing because rebalance is already in progress.
[rebalance:debug,2012-09-08T10:37:25.514,ns_1@10.3.121.92:<0.11756.1>:janitor_agent:bulk_set_vbucket_state:386]bulk vbucket state change failed for:
[{'ns_1@10.3.121.96',
{'EXIT',
{{couldnt_connect_to_memcached,
{gen_server,call,
['ns_memcached-default',{set_vbucket,164,replica},60000]}},
{gen_server,call,
[{'janitor_agent-default','ns_1@10.3.121.96'},
{if_rebalance,<0.11756.1>,
{update_vbucket_state,164,replica,undefined,
'ns_1@10.3.121.93'}},
infinity]}}}}]
[ns_server:info,2012-09-08T10:37:25.517,ns_1@10.3.121.92:'janitor_agent-default':janitor_agent:handle_info:646]Undoing temporary vbucket states caused by rebalance
[user:info,2012-09-08T10:37:25.517,ns_1@10.3.121.92:<0.3725.0>:ns_orchestrator:handle_info:295]Rebalance exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.121.96',
{'EXIT',
{{couldnt_connect_to_memcached,
{gen_server,call,
['ns_memcached-default',
{set_vbucket,164,replica},
60000]}},
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.3.121.96'},
{if_rebalance,<0.11756.1>,
{update_vbucket_state,164,replica,
undefined,'ns_1@10.3.121.93'}},
infinity]}}}}]},
[{janitor_agent,bulk_set_vbucket_state,4},
{ns_vbucket_mover, update_replication_post_move,3},
{ns_vbucket_mover,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}
]}
[ns_server:debug,2012-09-08T10:37:25.517,ns_1@10.3.121.92:<0.11763.1>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
{ns_node_disco_events,<0.11756.1>} exited with reason {{bulk_set_vbucket_state_failed,[{'ns_1@10.3.121.96',
{'EXIT',
{{couldnt_connect_to_memcached,
{gen_server,
call,
['ns_memcached-default',
{set_vbucket, 164, replica},
60000]}},
{gen_server,
call,
[{'janitor_agent-default', 'ns_1@10.3.121.96'},
{if_rebalance,
<0.11756.1>,
{update_vbucket_state,
164,
replica,
undefined,
'ns_1@10.3.121.93'}},
infinity]}}}}]},
[{janitor_agent, bulk_set_vbucket_state, 4},
{ns_vbucket_mover, update_replication_post_move, 3},
{ns_vbucket_mover, handle_info, 2},
{gen_server, handle_msg, 5},
{proc_lib, init_p_do_apply, 3}]}
[ns_server:debug,2012-09-08T10:37:25.521,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:25.521,ns_1@10.3.121.92:<0.11763.1>:ns_pubsub:do_subscribe_link:149]Deleting {ns_node_disco_events,<0.11756.1>}
event handler: ok
[ns_server:info,2012-09-08T10:37:25.526,ns_1@10.3.121.92:<0.22853.1>:diag_handler:log_all_tap_and_checkpoint_stats:126]logging tap & checkpoint stats
[ns_server:error,2012-09-08T10:37:25.530,ns_1@10.3.121.92:ns_doctor:ns_doctor:update_status:204]The following buckets became not ready on node 'ns_1@10.3.121.96': ["default"], those of them are active ["default"]
[ns_server:debug,2012-09-08T10:37:25.538,ns_1@10.3.121.92:ns_config_rep:ns_config_rep:do_push_keys:317]Replicating some config keys ([counters,rebalance_status,rebalancer_pid]..)
[ns_server:debug,2012-09-08T10:37:25.553,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change:
counters ->
[
,
{rebalance_start,3},
{rebalance_success,1}]
[ns_server:debug,2012-09-08T10:37:25.553,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:25.554,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:25.568,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change:
rebalancer_pid ->
undefined
[ns_server:debug,2012-09-08T10:37:25.569,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
[ns_server:debug,2012-09-08T10:37:25.604,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change:
rebalance_status ->