Details
-
Technical task
-
Resolution: Fixed
-
Major
-
2.0
-
Security Level: Public
-
windows 2008 R2 64bit with build 2.0.0-1952
Description
Environment:
8 windows server 2008 R2 64 bit in ec2. Each server has 7.5 GB RAM and 45GB EBS disk to store data.
Create a 6 nodes windows cluster in ec2
Loading phase:
- Create one default bucket and load 24 million items to it. Each item has size from 128 to 512 bytes.
Access phase:
- Mutate items with ratio creates/updates/gets/deletes/expirations = 10/60/20/5/5
- Doing rebalance add node, remove node, failover, failover and add back. The last relalance failed when doing remove a node and failover another node at the same time. Then rebalance.
- Rebalance failed with error "sync_shutdown_many_i_am_trapping_exits"
rebalance:debug,2012-11-20T5:02:22.275,ns_1@10.110.206.19:<0.22405.6>:janitor_agent:do_wait_checkpoint_persisted:828]Got etmpfail waiting for checkpoint persistence. Will try again
[ns_server:info,2012-11-20T5:02:22.478,ns_1@10.110.206.19:ns_port_memcached<0.2135.0>:ns_port_server:log:171]memcached<0.2135.0>: Tue Nov 20 05:02:22.284363 GMT Standard Time 3: Notified the timeout on checkpoint persistence for vbucket 881, cookie 0000000005A16B00
[ns_server:info,2012-11-20T5:02:23.585,ns_1@10.110.206.19:<0.22552.6>:compaction_daemon:spawn_vbucket_compactor:639]Compacting <<"default/345">>
[ns_server:info,2012-11-20T5:02:31.370,ns_1@10.110.206.19:<0.10210.2>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.12789.2>,
{dict,7,16,16,8,80,48,
,
{{[],[],[],[],[],
[['ns_1@10.111.66.215'|1.0],
['ns_1@10.214.194.95'|
0.9067357512953368]],
[['ns_1@10.96.107.176'|
0.6649484536082475],
['ns_1@10.110.222.56'|
0.37058823529411766]],
[],
[['ns_1@10.46.181.58'|
0.34705882352941175]],
[['ns_1@10.2.63.249'|
0.9642857142857143],
['ns_1@10.110.206.19'|1.0]],
[],[],[],[],[],[]}}},
['ns_1@10.2.63.249',
'ns_1@10.96.107.176',
'ns_1@10.110.206.19',
'ns_1@10.111.66.215',
'ns_1@10.214.194.95',
'ns_1@10.110.222.56'],
['ns_1@10.46.181.58'],
[]}
[ns_server:debug,2012-11-20T5:02:34.037,ns_1@10.110.206.19:janitor_agent-default<0.2184.0>:janitor_agent:handle_info:682]Got done message from subprocess: <0.22405.6> (ok)
[ns_server:info,2012-11-20T5:02:34.256,ns_1@10.110.206.19:ns_port_memcached<0.2135.0>:ns_port_server:log:171]memcached<0.2135.0>: Tue Nov 20 05:02:34.015714 GMT Standard Time 3: Notified the completion of checkpoint persistence for vbucket 881, cookie 0000000005A16B00
[ns_server:error,2012-11-20T5:02:35.847,ns_1@10.110.206.19:ns_doctor<0.2076.0>:ns_doctor:update_status:205]The following buckets became not ready on node 'ns_1@10.111.66.215': ["default"], those of them are active []
[ns_server:debug,2012-11-20T5:02:36.003,ns_1@10.110.206.19:capi_set_view_manager-default<0.2157.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[ns_server:debug,2012-11-20T5:02:36.034,ns_1@10.110.206.19:xdc_rdoc_replication_srv<0.2167.0>:xdc_rdoc_replication_srv:handle_info:132]doing replicate_newnodes_docs
[ns_server:debug,2012-11-20T5:02:36.034,ns_1@10.110.206.19:xdc_rdoc_replication_srv<0.2167.0>:xdc_rdoc_replication_srv:replicate_change_to_node:160]Sending _design/_replicator_info to ns_1@10.111.66.215
[ns_server:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.20550.6>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [
[ns_server:info,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.20550.6>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.214.194.95': [<<"replication_building_916_'ns_1@10.96.107.176'">>,
<<"replication_building_916_'ns_1@10.111.66.215'">>]
[error_logger:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.20550.6> terminating
** Last message in was {'EXIT',<0.20543.6>,shutdown}
** When Server state == {state,"default",916,'ns_1@10.214.194.95',
[{'ns_1@10.96.107.176',<15680.9900.6>},
{'ns_1@10.111.66.215',<15693.31507.5>}]}
** Reason for termination ==
** {{badmatch,[{<15693.31507.5>,killed}
]},
[
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}
[ns_server:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.20543.6>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [{<0.20550.6>,
{{badmatch,[{<15693.31507.5>,killed}]},
[{misc, sync_shutdown_many_i_am_trapping_exits, 1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}},
{<0.22253.6>,
{{wait_checkpoint_persisted_failed,
"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,
249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}]}}]
[ns_server:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.20543.6>:misc:try_with_maybe_ignorant_after:1444]Eating exception from ignorant after-block:
{error,
{badmatch,
[{<0.20550.6>,
{{badmatch,[{<15693.31507.5>,killed}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1}
,
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}},
{<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2}
,
,
[rebalance:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.12877.2>:ns_vbucket_mover:handle_info:252]<0.20543.6> exited with {unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,
249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}
[ns_server:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.22383.6>:ns_single_vbucket_mover:spawn_and_wait:81]Got unexpected exit signal {'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",
916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}]}}}}
[ns_server:debug,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.22388.6>:ebucketmigrator_srv:confirm_sent_messages:710]Going to wait for reception of opaque message ack
[ns_server:debug,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.22664.6>:ebucketmigrator_srv:confirm_sent_messages:705]Sending opaque message to confirm downstream reception
[ns_server:debug,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.22664.6>:ebucketmigrator_srv:confirm_sent_messages:707]Sent fine
[ns_server:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.22351.6>:ns_single_vbucket_mover:spawn_and_wait:81]Got unexpected exit signal {'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",
916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}]}}}}
[ns_server:error,2012-11-20T5:02:38.749,ns_1@10.110.206.19:<0.22475.6>:ns_single_vbucket_mover:spawn_and_wait:81]Got unexpected exit signal {'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",
916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}]}}}}
[rebalance:info,2012-11-20T5:02:38.764,ns_1@10.110.206.19:<0.22388.6>:ebucketmigrator_srv:do_confirm_sent_messages:684]Got close ack!
[error_logger:error,2012-11-20T5:02:38.764,ns_1@10.110.206.19:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: new_ns_replicas_builder:init/1
pid: <0.20550.6>
registered_name: []
exception exit: {{badmatch,[{<15693.31507.5>,killed}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}
]}
in function gen_server:terminate/6
ancestors: [<0.20543.6>,<0.12877.2>,<0.12789.2>]
messages: []
links: [<0.20543.6>]
dictionary: []
trap_exit: true
status: running
heap_size: 317811
stack_size: 24
reductions: 32264
neighbours:
[error_logger:error,2012-11-20T5:02:38.764,ns_1@10.110.206.19:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.20543.6>
registered_name: []
exception exit: {unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}
in function ns_single_vbucket_mover:spawn_and_wait/1
in call from ns_single_vbucket_mover:mover_inner/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.12877.2>,<0.12789.2>]
messages: []
links: [<0.12877.2>]
dictionary: [{cleanup_list,[<0.20550.6>,<0.22253.6>]}]
trap_exit: true
status: running
heap_size: 4181
stack_size: 24
reductions: 12536
neighbours:
[ns_server:info,2012-11-20T5:02:38.780,ns_1@10.110.206.19:<0.22387.6>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.2.63.249': [<<"replication_building_881_'ns_1@10.96.107.176'">>,
<<"replication_building_881_'ns_1@10.110.206.19'">>]
[error_logger:error,2012-11-20T5:02:38.780,ns_1@10.110.206.19:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.22383.6>
registered_name: []
exception exit: {unexpected_exit,
{'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default','ns_1@10.111.66.215'}
,
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[
in function ns_single_vbucket_mover:spawn_and_wait/1
in call from ns_single_vbucket_mover:mover_inner/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.12877.2>,<0.12789.2>]
messages: [{'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default','ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}}]
links: [<0.12877.2>]
dictionary: [{cleanup_list,[<0.22387.6>,<0.22401.6>]}]
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 5554
neighbours:
[ns_server:info,2012-11-20T5:02:38.795,ns_1@10.110.206.19:<0.22478.6>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.96.107.176': [<<"replication_building_990_'ns_1@10.110.222.56'">>]
[error_logger:error,2012-11-20T5:02:38.795,ns_1@10.110.206.19:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.22475.6>
registered_name: []
exception exit: {unexpected_exit,
{'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default','ns_1@10.111.66.215'},
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}
]}}}}}
in function ns_single_vbucket_mover:spawn_and_wait/1
in call from ns_single_vbucket_mover:mover_inner/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.12877.2>,<0.12789.2>]
messages: [{'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}}]
links: [<0.12877.2>,<0.22663.6>]
dictionary: [{cleanup_list,[<0.22478.6>,<0.22493.6>]}]
trap_exit: true
status: running
heap_size: 4181
stack_size: 24
reductions: 4822
neighbours:
[ns_server:info,2012-11-20T5:02:38.827,ns_1@10.110.206.19:<0.22355.6>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.46.181.58': [<<"replication_building_679_'ns_1@10.214.194.95'">>,
<<"replication_building_679_'ns_1@10.96.107.176'">>]
[error_logger:error,2012-11-20T5:02:38.827,ns_1@10.110.206.19:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.22351.6>
registered_name: []
exception exit: {unexpected_exit,
{'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[{'janitor_agent-default','ns_1@10.111.66.215'}
,
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[
]}}}}}
in function ns_single_vbucket_mover:spawn_and_wait/1
in call from ns_single_vbucket_mover:mover_inner/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.12877.2>,<0.12789.2>]
messages: [{'EXIT',<0.12877.2>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[
,
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[
]}}}}]
links: [<0.12877.2>,<0.22671.6>]
dictionary: [
]
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 5554
neighbours:
[ns_server:debug,2012-11-20T5:02:38.827,ns_1@10.110.206.19:<0.12884.2>:ns_pubsub:do_subscribe_link:132]Parent process of subscription
{ns_node_disco_events,<0.12877.2>} exited with reason {unexpected_exit,
{'EXIT',
<0.22253.6>,
{{wait_checkpoint_persisted_failed,
"default",
916,
249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,
call,
[
,
{if_rebalance,
<0.12877.2>,
{wait_checkpoint_persisted,
916,
249}},
infinity]}}}}]},
[
]}}}
[user:info,2012-11-20T5:02:38.842,ns_1@10.110.206.19:<0.10210.2>:ns_orchestrator:handle_info:319]Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",
916,249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[
,
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[
]}}}
[ns_server:info,2012-11-20T5:02:38.842,ns_1@10.110.206.19:<0.23178.6>:diag_handler:log_all_tap_and_checkpoint_stats:127]logging tap & checkpoint stats
[ns_server:info,2012-11-20T5:02:38.858,ns_1@10.110.206.19:janitor_agent-default<0.2184.0>:janitor_agent:handle_info:676]Undoing temporary vbucket states caused by rebalance
[ns_server:debug,2012-11-20T5:02:38.858,ns_1@10.110.206.19:capi_set_view_manager-default<0.2157.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[ns_server:info,2012-11-20T5:02:38.998,ns_1@10.110.206.19:ns_port_memcached<0.2135.0>:ns_port_server:log:171]memcached<0.2135.0>: Tue Nov 20 05:02:38.773775 GMT Standard Time 3: TAP (Consumer) eq_tapq:anon_402 - disconnected
[error_logger:error,2012-11-20T5:02:38.827,ns_1@10.110.206.19:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.12877.2> terminating
-
- Last message in was {'EXIT',<0.20543.6>,
{unexpected_exit,
{'EXIT',<0.22253.6>,
{{wait_checkpoint_persisted_failed,"default",916,
249,
[{'ns_1@10.111.66.215',
{'EXIT',
{noproc,
{gen_server,call,
[ {'janitor_agent-default', 'ns_1@10.111.66.215'},
{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}
{if_rebalance,<0.12877.2>,
{wait_checkpoint_persisted,916,249}},
infinity]}}}}]},
[]}}}}
- Last message in was {'EXIT',<0.20543.6>,
Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1952-rel.setup.exe.manifest.xml
-
- Do rebalance again, rebalance passed.