Details
-
Bug
-
Resolution: Fixed
-
Major
-
2.0.1
-
Security Level: Public
-
None
Description
build 2.0.1-120-rel
http://qa.hq.northscale.net/job/centos-32-2.0-swaprebalance-tests/250/consoleFull
/testrunner -i /tmp/swaprebalance-cent-32.ini get-logs=True,GROUP=P0 -t swaprebalance.SwapRebalanceFailedTests.test_failed_swap_rebalance,replica=1,num-buckets=4,num-swap=3,swap-orchestrator=True,GROUP=P0
steps:
1.rebalance params : password=password&ejectedNodes=ns_1%4010.3.2.146%2Cns_1%4010.3.2.152%2Cns_1%4010.3.2.145&user=Administrator&knownNodes=ns_1%4010.3.2.146%2Cns_1%4010.3.2.147%2Cns_1%4010.3.2.149%2Cns_1%4010.3.2.148%2Cns_1%4010.3.2.145%2Cns_1%4010.3.2.152
2. kill memcached when rebalance reached >40%
[2013-01-08 13:47:49,441] - [rest_client:881] INFO - rebalance percentage : 40.0627668879 %
[2013-01-08 13:47:51,444] - [rest_client:73] INFO - rebalance reached >40.0627668879% in 663.750247002 seconds
[2013-01-08 13:47:51,971] - [remote_util:116] INFO - connecting to 10.3.2.149 with username : root password : couchbase ssh_key:
[2013-01-08 13:47:52,196] - [remote_util:148] INFO - Connected
[2013-01-08 13:47:52,557] - [remote_util:1128] INFO - running command.raw sudo cat /proc/cpuinfo
[2013-01-08 13:47:52,657] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,658] - [remote_util:1128] INFO - running command.raw sudo df -Th
[2013-01-08 13:47:52,774] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,775] - [remote_util:1128] INFO - running command.raw sudo cat /proc/meminfo
[2013-01-08 13:47:52,882] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,883] - [remote_util:1128] INFO - running command.raw hostname
[2013-01-08 13:47:52,991] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,993] - [remote_util:1128] INFO - running command.raw ps -eo comm,pid | awk '$1 == "memcached"
'
[2013-01-08 13:47:53,107] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:53,108] - [swaprebalance:401] INFO - os:cmd("kill -9 3663 ")
[2013-01-08 13:47:53,130] - [rest_client:803] INFO - /diag/eval status: True content: [] command: os:cmd("kill -9 3663 ")
[2013-01-08 13:47:53,130] - [swaprebalance:403] INFO - killed 10.3.2.149:8091?? (True, '[]')
[2013-01-08 13:47:53,131] - [swaprebalance:404] INFO - sleep for 10 sec after kill memcached
...
2013-01-08 13:49:04,037] - [swaprebalance:412] INFO - rebalance progress: 42.5532203758
[2013-01-08 13:49:05,058] - [swaprebalance:415] INFO - rebalance progress status:running
[2013-01-08 13:49:05,068] - [swaprebalance:417] INFO - rebalance is still running even after restarting memcached
[2013-01-08 13:49:05,069] - [swaprebalance:379] INFO - FAIL SWAP REBALANCE PHASE @ 60
[2013-01-08 13:49:05,079] - [rest_client:881] INFO - rebalance percentage : 42.5775868475 %
...
[2013-01-08 13:52:34,570] - [rest_client:881] INFO - rebalance percentage : 50.0 %
[2013-01-08 13:52:36,597] - [rest_client:866] ERROR -
- rebalance failed
[ns_server:info,2013-01-08T12:32:23.377,ns_1@10.3.2.145:<0.32321.1>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.2.145': [<<"replication_building_138_'ns_1@10.3.2.147'">>,
<<"replication_building_138_'ns_1@10.3.2.149'">>]
[error_logger:error,2013-01-08T12:32:23.377,ns_1@10.3.2.145:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.32321.1> terminating
-
- Last message in was {'EXIT',<20175.7759.0>,{badmatch,
{error,closed}}}
** When Server state == {state,"default",138,'ns_1@10.3.2.145',
[{'ns_1@10.3.2.147',<20177.4979.0>},
{'ns_1@10.3.2.149',<20175.7759.0>}]}
** Reason for termination ==
** {{badmatch,[{<20177.4979.0>,{badmatch,{error,closed}}},
{<20175.7759.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}
[ns_server:error,2013-01-08T12:32:23.378,ns_1@10.3.2.145:<0.32285.1>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [{<0.32321.1>,
{{badmatch,
[{<20177.4979.0>,
{badmatch,{error,closed}}},
{<20175.7759.0>,noproc}]},
{misc, sync_shutdown_many_i_am_trapping_exits, 1}
[,
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]
[ns_server:error,2013-01-08T12:32:23.379,ns_1@10.3.2.145:<0.32285.1>:misc:try_with_maybe_ignorant_after:1444]Eating exception from ignorant after-block:
{error,{badmatch,[{<0.32321.1>,
badmatch,[{<20177.4979.0>,{badmatch,{error,closed},
{<20175.7759.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2},
{ns_single_vbucket_mover,mover,6},
{proc_lib,init_p_do_apply,3}]}
[error_logger:error,2013-01-08T12:32:23.379,ns_1@10.3.2.145:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: new_ns_replicas_builder:init/1
pid: <0.32321.1>
registered_name: []
exception exit: badmatch,[{<20177.4979.0>,{badmatch,{error,closed},
{<20175.7759.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}
in function gen_server:terminate/6
ancestors: [<0.32285.1>,<0.13837.1>,<0.13752.1>]
messages: [{'EXIT',<0.32285.1>,shutdown}]
links: [<0.32285.1>]
dictionary: []
trap_exit: true
status: running
heap_size: 196418
stack_size: 24
reductions: 24429
neighbours:
[views:debug,2013-01-08T12:32:23.379,ns_1@10.3.2.145:mc_couch_events<0.425.0>:capi_set_view_manager:handle_mc_couch_event:529]Got set_vbucket event for default/438. Updated state: dead (1)
[ns_server:info,2013-01-08T12:32:23.383,ns_1@10.3.2.145:janitor_agent-default<0.643.0>:janitor_agent:handle_info:676]Undoing temporary vbucket states caused by rebalance
[user:info,2013-01-08T12:32:23.383,ns_1@10.3.2.145:<0.2801.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.2.146',
{'EXIT',
{{{{unexpected_reason,
badmatch,{error,closed,
[{mc_binary,quick_stats_recv,3},
{mc_binary,quick_stats_loop,5},
{mc_binary,quick_stats,5},
{mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
{ebucketmigrator_srv,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}},
{misc,executing_on_new_process,1},
[
{tap_replication_manager, change_vbucket_filter,4},
{tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
{tap_replication_manager, do_set_incoming_replication_map,3},
{tap_replication_manager,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['tap_replication_manager-default',
{change_vbucket_replication,139, undefined},
infinity]}},
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.3.2.146'},
{if_rebalance,<0.13837.1>,
{update_vbucket_state,139,replica,
undefined,undefined}},
infinity]}}}}]},
[{janitor_agent,bulk_set_vbucket_state,4},
{ns_vbucket_mover, update_replication_post_move,3},
{ns_vbucket_mover,on_move_done,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
[ns_server:debug,2013-01-08T12:32:23.384,ns_1@10.3.2.145:compaction_daemon<0.467.0>:compaction_daemon:handle_info:367]Looks like vbucket mover inhibiting view compaction for for bucket "default" is dead. Canceling inhibition
[ns_server:debug,2013-01-08T12:32:23.383,ns_1@10.3.2.145:<0.13845.1>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {ns_node_disco_events,<0.13837.1>} exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.2.146',
{'EXIT',
{{{{unexpected_reason,
{{badmatch,
{error,
closed}},
[{mc_binary, quick_stats_recv, 3},
{mc_binary, quick_stats_loop, 5},
{mc_binary, quick_stats, 5},
{mc_client_binary, get_zero_open_checkpoint_vbuckets, 3},
{ebucketmigrator_srv, handle_call, 3},
{gen_server, handle_msg, 5},
{proc_lib, init_p_do_apply, 3}]}},
[{misc, executing_on_new_process, 1},
{tap_replication_manager, change_vbucket_filter, 4},
{tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
{tap_replication_manager, do_set_incoming_replication_map, 3},
{tap_replication_manager, handle_call, 3},
{gen_server, handle_msg, 5},
{proc_lib, init_p_do_apply, 3}]},
{gen_server,
call,
['tap_replication_manager-default',
{change_vbucket_replication, 139, undefined},
infinity]}},
{gen_server,
call,
[{'janitor_agent-default', 'ns_1@10.3.2.146'},
{if_rebalance,
<0.13837.1>,
{update_vbucket_state,
139,
replica,
undefined,
undefined}},
infinity]}}}}]},
[{janitor_agent, bulk_set_vbucket_state, 4},
{ns_vbucket_mover, update_replication_post_move, 3},
{ns_vbucket_mover, on_move_done, 2},
{gen_server, handle_msg, 5},
{proc_lib, init_p_do_apply, 3}]}
[ns_server:debug,2013-01-08T12:32:23.384,ns_1@10.3.2.145:<0.32356.1>:mc_connection:do_notify_vbucket_update:112]Signaled mc_couch_event: {set_vbucket,"default",438,dead,1}
[ns_server:debug,2013-01-08T12:32:23.387,ns_1@10.3.2.145:capi_set_view_manager-default<0.617.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[error_logger:error,2013-01-08T12:32:23.387,ns_1@10.3.2.145:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.32285.1>
registered_name: []
exception exit: {unexpected_exit,
{'EXIT',<0.13837.1>,
{{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.2.146',
{'EXIT',
{{{{unexpected_reason,
badmatch,{error,closed,
[{mc_binary,quick_stats_recv,3},
{mc_binary,quick_stats_loop,5},
{mc_binary,quick_stats,5},
{mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
{ebucketmigrator_srv,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}},
[{misc,executing_on_new_process,1},
{tap_replication_manager, change_vbucket_filter,4}
,
{tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2}
,
{tap_replication_manager, do_set_incoming_replication_map,3}
,
{tap_replication_manager,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['tap_replication_manager-default',
{change_vbucket_replication,139,undefined},
infinity]}},
{gen_server,call,
[{'janitor_agent-default','ns_1@10.3.2.146'},
{if_rebalance,<0.13837.1>,
{update_vbucket_state,139,replica,undefined,
undefined}},
infinity]}}}}]},
[{janitor_agent,bulk_set_vbucket_state,4},
{ns_vbucket_mover,update_replication_post_move,3},
{ns_vbucket_mover,on_move_done,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}}
in function ns_single_vbucket_mover:spawn_and_wait/1
in call from ns_single_vbucket_mover:mover_inner/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.13837.1>,<0.13752.1>]
messages: [{'EXIT',<0.13837.1>,
{{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.2.146',
{'EXIT',
{{{{unexpected_reason,
badmatch,{error,closed,
[{mc_binary,quick_stats_recv,3},
{mc_binary,quick_stats_loop,5},
{mc_binary,quick_stats,5},
{mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
{ebucketmigrator_srv,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}},
[{misc,executing_on_new_process,1},
{tap_replication_manager,change_vbucket_filter,4},
{tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
{tap_replication_manager, do_set_incoming_replication_map,3},
{tap_replication_manager,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['tap_replication_manager-default',
{change_vbucket_replication,139,undefined},
infinity]}},
{gen_server,call,
[{'janitor_agent-default','ns_1@10.3.2.146'},
{if_rebalance,<0.13837.1>,
{update_vbucket_state,139,replica,undefined,
undefined}},
infinity]}}}}]},
[{janitor_agent,bulk_set_vbucket_state,4},
{ns_vbucket_mover,update_replication_post_move,3},
{ns_vbucket_mover,on_move_done,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}
]}}]
{cleanup_list,[<0.32321.1>,<0.32322.1>]}
links: [<0.13837.1>,<0.32352.1>]
dictionary: []
trap_exit: true
status: running
heap_size: 6765
stack_size: 24
reductions: 9680
neighbours:
- Last message in was {'EXIT',<20175.7759.0>,{badmatch,
{error,closed}}}