Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 2.0.1
Affects Version/s: 2.0.1
Component/s: couchbase-bucket, ns_server
Security Level: Public
Labels:
None

Description

build 2.0.1-120-rel
http://qa.hq.northscale.net/job/centos-32-2.0-swaprebalance-tests/250/consoleFull
/testrunner -i /tmp/swaprebalance-cent-32.ini get-logs=True,GROUP=P0 -t swaprebalance.SwapRebalanceFailedTests.test_failed_swap_rebalance,replica=1,num-buckets=4,num-swap=3,swap-orchestrator=True,GROUP=P0

steps:
1.rebalance params : password=password&ejectedNodes=ns_1%4010.3.2.146%2Cns_1%4010.3.2.152%2Cns_1%4010.3.2.145&user=Administrator&knownNodes=ns_1%4010.3.2.146%2Cns_1%4010.3.2.147%2Cns_1%4010.3.2.149%2Cns_1%4010.3.2.148%2Cns_1%4010.3.2.145%2Cns_1%4010.3.2.152
2. kill memcached when rebalance reached >40%
[2013-01-08 13:47:49,441] - [rest_client:881] INFO - rebalance percentage : 40.0627668879 %
[2013-01-08 13:47:51,444] - [rest_client:73] INFO - rebalance reached >40.0627668879% in 663.750247002 seconds
[2013-01-08 13:47:51,971] - [remote_util:116] INFO - connecting to 10.3.2.149 with username : root password : couchbase ssh_key:
[2013-01-08 13:47:52,196] - [remote_util:148] INFO - Connected
[2013-01-08 13:47:52,557] - [remote_util:1128] INFO - running command.raw sudo cat /proc/cpuinfo
[2013-01-08 13:47:52,657] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,658] - [remote_util:1128] INFO - running command.raw sudo df -Th
[2013-01-08 13:47:52,774] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,775] - [remote_util:1128] INFO - running command.raw sudo cat /proc/meminfo
[2013-01-08 13:47:52,882] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,883] - [remote_util:1128] INFO - running command.raw hostname
[2013-01-08 13:47:52,991] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:52,993] - [remote_util:1128] INFO - running command.raw ps -eo comm,pid | awk '$1 == "memcached"

{ print $2 }

'
[2013-01-08 13:47:53,107] - [remote_util:1157] INFO - command executed successfully
[2013-01-08 13:47:53,108] - [swaprebalance:401] INFO - os:cmd("kill -9 3663 ")
[2013-01-08 13:47:53,130] - [rest_client:803] INFO - /diag/eval status: True content: [] command: os:cmd("kill -9 3663 ")
[2013-01-08 13:47:53,130] - [swaprebalance:403] INFO - killed 10.3.2.149:8091?? (True, '[]')
[2013-01-08 13:47:53,131] - [swaprebalance:404] INFO - sleep for 10 sec after kill memcached
...
2013-01-08 13:49:04,037] - [swaprebalance:412] INFO - rebalance progress: 42.5532203758
[2013-01-08 13:49:05,058] - [swaprebalance:415] INFO - rebalance progress status:running
[2013-01-08 13:49:05,068] - [swaprebalance:417] INFO - rebalance is still running even after restarting memcached
[2013-01-08 13:49:05,069] - [swaprebalance:379] INFO - FAIL SWAP REBALANCE PHASE @ 60
[2013-01-08 13:49:05,079] - [rest_client:881] INFO - rebalance percentage : 42.5775868475 %
...
[2013-01-08 13:52:34,570] - [rest_client:881] INFO - rebalance percentage : 50.0 %
[2013-01-08 13:52:36,597] - [rest_client:866] ERROR -

{u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

- rebalance failed

[ns_server:info,2013-01-08T12:32:23.377,ns_1@10.3.2.145:<0.32321.1>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.2.145': [<<"replication_building_138_'ns_1@10.3.2.147'">>,
<<"replication_building_138_'ns_1@10.3.2.149'">>]
[error_logger:error,2013-01-08T12:32:23.377,ns_1@10.3.2.145:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.32321.1> terminating

- Last message in was {'EXIT',<20175.7759.0>,{badmatch, {error,closed}}}
  ** When Server state == {state,"default",138,'ns_1@10.3.2.145',
  [{'ns_1@10.3.2.147',<20177.4979.0>},
  {'ns_1@10.3.2.149',<20175.7759.0>}]}
  ** Reason for termination ==
  ** {{badmatch,[{<20177.4979.0>,{badmatch,{error,closed}
  }},
  
  {<20175.7759.0>,noproc}]},
  [{misc,sync_shutdown_many_i_am_trapping_exits,1},
  {misc,try_with_maybe_ignorant_after,2},
  {gen_server,terminate,6},
  {proc_lib,init_p_do_apply,3}]}
  
  [ns_server:error,2013-01-08T12:32:23.378,ns_1@10.3.2.145:<0.32285.1>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [{<0.32321.1>,
  {{badmatch,
  [{<20177.4979.0>,
  {badmatch,{error,closed}}},
  {<20175.7759.0>,noproc}
  ]},
  [
  {misc, sync_shutdown_many_i_am_trapping_exits, 1}
  ,
  
  {misc,try_with_maybe_ignorant_after,2},
  {gen_server,terminate,6},
  {proc_lib,init_p_do_apply,3}]}}]
  [ns_server:error,2013-01-08T12:32:23.379,ns_1@10.3.2.145:<0.32285.1>:misc:try_with_maybe_ignorant_after:1444]Eating exception from ignorant after-block:
  {error,{badmatch,[{<0.32321.1>,
  badmatch,[{<20177.4979.0>,{badmatch,{error,closed},
  {<20175.7759.0>,noproc}]},
  [{misc,sync_shutdown_many_i_am_trapping_exits,1},
  {misc,try_with_maybe_ignorant_after,2}
  ,
  
  {gen_server,terminate,6},
  {proc_lib,init_p_do_apply,3}]}}]},
  [{misc,sync_shutdown_many_i_am_trapping_exits,1},
  {misc,try_with_maybe_ignorant_after,2},
  {ns_single_vbucket_mover,mover,6},
  {proc_lib,init_p_do_apply,3}]}
  [error_logger:error,2013-01-08T12:32:23.379,ns_1@10.3.2.145:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
  =========================CRASH REPORT=========================
  crasher:
  initial call: new_ns_replicas_builder:init/1
  pid: <0.32321.1>
  registered_name: []
  exception exit: badmatch,[{<20177.4979.0>,{badmatch,{error,closed},
  {<20175.7759.0>,noproc}]},
  [{misc,sync_shutdown_many_i_am_trapping_exits,1},
  {misc,try_with_maybe_ignorant_after,2},
  {gen_server,terminate,6}
  ,
  
  {proc_lib,init_p_do_apply,3}]}
  in function gen_server:terminate/6
  ancestors: [<0.32285.1>,<0.13837.1>,<0.13752.1>]
  messages: [{'EXIT',<0.32285.1>,shutdown}]
  links: [<0.32285.1>]
  dictionary: []
  trap_exit: true
  status: running
  heap_size: 196418
  stack_size: 24
  reductions: 24429
  neighbours:
  
  [views:debug,2013-01-08T12:32:23.379,ns_1@10.3.2.145:mc_couch_events<0.425.0>:capi_set_view_manager:handle_mc_couch_event:529]Got set_vbucket event for default/438. Updated state: dead (1)
  [ns_server:info,2013-01-08T12:32:23.383,ns_1@10.3.2.145:janitor_agent-default<0.643.0>:janitor_agent:handle_info:676]Undoing temporary vbucket states caused by rebalance
  [user:info,2013-01-08T12:32:23.383,ns_1@10.3.2.145:<0.2801.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {{bulk_set_vbucket_state_failed,
  [{'ns_1@10.3.2.146',
  {'EXIT',
  {{{{unexpected_reason,
  badmatch,{error,closed,
  [{mc_binary,quick_stats_recv,3},
  {mc_binary,quick_stats_loop,5},
  {mc_binary,quick_stats,5},
  {mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
  {ebucketmigrator_srv,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}
  ]}},
  [
  {misc,executing_on_new_process,1},
  {tap_replication_manager, change_vbucket_filter,4},
  {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
  {tap_replication_manager, do_set_incoming_replication_map,3},
  {tap_replication_manager,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]},
  {gen_server,call,
  ['tap_replication_manager-default',
  {change_vbucket_replication,139, undefined},
  infinity]}},
  {gen_server,call,
  [{'janitor_agent-default', 'ns_1@10.3.2.146'},
  {if_rebalance,<0.13837.1>,
  {update_vbucket_state,139,replica,
  undefined,undefined}},
  infinity]}}}}]},
  [{janitor_agent,bulk_set_vbucket_state,4},
  {ns_vbucket_mover, update_replication_post_move,3},
  {ns_vbucket_mover,on_move_done,2},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]}
  
  [ns_server:debug,2013-01-08T12:32:23.384,ns_1@10.3.2.145:compaction_daemon<0.467.0>:compaction_daemon:handle_info:367]Looks like vbucket mover inhibiting view compaction for for bucket "default" is dead. Canceling inhibition
  [ns_server:debug,2013-01-08T12:32:23.383,ns_1@10.3.2.145:<0.13845.1>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {ns_node_disco_events,<0.13837.1>} exited with reason {{bulk_set_vbucket_state_failed,
  [{'ns_1@10.3.2.146',
  {'EXIT',
  {{{{unexpected_reason,
  {{badmatch,
  {error,
  closed}},
  [{mc_binary, quick_stats_recv, 3},
  {mc_binary, quick_stats_loop, 5},
  {mc_binary, quick_stats, 5},
  {mc_client_binary, get_zero_open_checkpoint_vbuckets, 3},
  {ebucketmigrator_srv, handle_call, 3},
  {gen_server, handle_msg, 5},
  {proc_lib, init_p_do_apply, 3}]}},
  [{misc, executing_on_new_process, 1},
  {tap_replication_manager, change_vbucket_filter, 4},
  {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
  {tap_replication_manager, do_set_incoming_replication_map, 3},
  {tap_replication_manager, handle_call, 3},
  {gen_server, handle_msg, 5},
  {proc_lib, init_p_do_apply, 3}]},
  {gen_server,
  call,
  ['tap_replication_manager-default',
  {change_vbucket_replication, 139, undefined},
  infinity]}},
  {gen_server,
  call,
  [{'janitor_agent-default', 'ns_1@10.3.2.146'},
  {if_rebalance,
  <0.13837.1>,
  {update_vbucket_state,
  139,
  replica,
  undefined,
  undefined}},
  infinity]}}}}]},
  [{janitor_agent, bulk_set_vbucket_state, 4},
  {ns_vbucket_mover, update_replication_post_move, 3},
  {ns_vbucket_mover, on_move_done, 2},
  {gen_server, handle_msg, 5},
  {proc_lib, init_p_do_apply, 3}]}
  [ns_server:debug,2013-01-08T12:32:23.384,ns_1@10.3.2.145:<0.32356.1>:mc_connection:do_notify_vbucket_update:112]Signaled mc_couch_event: {set_vbucket,"default",438,dead,1}
  [ns_server:debug,2013-01-08T12:32:23.387,ns_1@10.3.2.145:capi_set_view_manager-default<0.617.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
  [error_logger:error,2013-01-08T12:32:23.387,ns_1@10.3.2.145:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
  =========================CRASH REPORT=========================
  crasher:
  initial call: ns_single_vbucket_mover:mover/6
  pid: <0.32285.1>
  registered_name: []
  exception exit: {unexpected_exit,
  {'EXIT',<0.13837.1>,
  {{bulk_set_vbucket_state_failed,
  [{'ns_1@10.3.2.146',
  {'EXIT',
  {{{{unexpected_reason,
  badmatch,{error,closed,
  [{mc_binary,quick_stats_recv,3},
  {mc_binary,quick_stats_loop,5},
  {mc_binary,quick_stats,5},
  {mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
  {ebucketmigrator_srv,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]}},
  [{misc,executing_on_new_process,1}
  ,
  
  {tap_replication_manager, change_vbucket_filter,4}
  ,
  
  {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2}
  ,
  
  {tap_replication_manager, do_set_incoming_replication_map,3}
  ,
  
  {tap_replication_manager,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]},
  {gen_server,call,
  ['tap_replication_manager-default',
  {change_vbucket_replication,139,undefined},
  infinity]}},
  {gen_server,call,
  [{'janitor_agent-default','ns_1@10.3.2.146'},
  {if_rebalance,<0.13837.1>,
  {update_vbucket_state,139,replica,undefined,
  undefined}},
  infinity]}}}}]},
  [{janitor_agent,bulk_set_vbucket_state,4},
  {ns_vbucket_mover,update_replication_post_move,3},
  {ns_vbucket_mover,on_move_done,2},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]}}}
  in function ns_single_vbucket_mover:spawn_and_wait/1
  in call from ns_single_vbucket_mover:mover_inner/6
  in call from misc:try_with_maybe_ignorant_after/2
  in call from ns_single_vbucket_mover:mover/6
  ancestors: [<0.13837.1>,<0.13752.1>]
  messages: [{'EXIT',<0.13837.1>,
  {{bulk_set_vbucket_state_failed,
  [{'ns_1@10.3.2.146',
  {'EXIT',
  {{{{unexpected_reason,
  badmatch,{error,closed,
  [{mc_binary,quick_stats_recv,3},
  {mc_binary,quick_stats_loop,5},
  {mc_binary,quick_stats,5},
  {mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
  {ebucketmigrator_srv,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]}},
  [{misc,executing_on_new_process,1},
  {tap_replication_manager,change_vbucket_filter,4},
  {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
  {tap_replication_manager, do_set_incoming_replication_map,3},
  {tap_replication_manager,handle_call,3}
  ,
  
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]},
  {gen_server,call,
  ['tap_replication_manager-default',
  {change_vbucket_replication,139,undefined},
  infinity]}},
  {gen_server,call,
  [{'janitor_agent-default','ns_1@10.3.2.146'},
  {if_rebalance,<0.13837.1>,
  {update_vbucket_state,139,replica,undefined,
  undefined}},
  infinity]}}}}]},
  [{janitor_agent,bulk_set_vbucket_state,4},
  {ns_vbucket_mover,update_replication_post_move,3},
  {ns_vbucket_mover,on_move_done,2},
  {gen_server,handle_msg,5}
  ,
  
  {proc_lib,init_p_do_apply,3}
  ]}}]
  links: [<0.13837.1>,<0.32352.1>]
  dictionary: [
  {cleanup_list,[<0.32321.1>,<0.32322.1>]}
  ]
  trap_exit: true
  status: running
  heap_size: 6765
  stack_size: 24
  reductions: 9680
  neighbours:

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Rebalance exited with reason {{bulk_set_vbucket_state_failed after crasher in new_ns_replicas_builder(rebalance continued after killing memcached)

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty