Details
-
Technical task
-
Resolution: Duplicate
-
Blocker
-
2.0
-
Security Level: Public
-
centos 6.2 64bit build 2.0.0-1931
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- 24.8 GB RAM for couchbase server at each node
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1931
- Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1931-rel.rpm.manifest.xml
- Cluster has 2 buckets, default and saslbucket (12GB/each with 1 replica) and with 64 vbuckets setup.
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
10.6.2.37
10.6.2.38
10.6.2.44
10.6.2.45
10.6.2.42
10.6.2.43
- Load 20 million items to each bucket. Each key has size 1024 bytes
- After done loading, wait until initial index.
- After initial indexing done, mutate all items with size from 1024 to 1512 bytes.
- Queries all 4 views from 2 docs
- Add node 44 and rebalance. Passed
- Add node 45 and rebalance. Passed.
- Check auto failover is enable on cluster.
- Turn on firewall on node 40
iptables -A INPUT -p tcp -i eth0 --dport 1000:60000 -j REJECT
iptables -A OUTPUT -p tcp -o eth0 --sport 1000:60000 -j REJECT - Node 40 was down as expected.
- Auto failover kicked in after one minute.
- Disable firewall on node 40. Cluster saw node 40 up.
- Add node 40 back to cluster and rebalance. In few seconds, rebalance failed with error: "Failed to wait deletion of some buckets on some nodes." Filed bug
MB-7110 - Wait about 1 and half hour, rebalance again. Rebalance failed with error:" wait_checkpoint_persisted_failed"
ns_server:info,2012-11-06T5:42:13.901,ns_1@10.6.2.37:janitor_agent-default<0.30140.0>:janitor_agent:handle_info:676]Undoing temporary vbucket states caused by rebalance
[error_logger:error,2012-11-06T5:42:13.901,ns_1@10.6.2.37:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.11943.2727>
registered_name: []
exception exit: {unexpected_exit,
{'EXIT',<0.12020.2727>,
{{wait_checkpoint_persisted_failed,"default",50,3131,
[{'ns_1@10.6.2.40',
{'EXIT',
{{badmatch,{error,timeout,
[
{mc_client_binary,select_bucket,2},
{ns_memcached,ensure_bucket,2},
{ns_memcached,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['ns_memcached-default',
{wait_for_checkpoint_persistence,37,2959},
infinity]}},
{gen_server,call,
[{'janitor_agent-default','ns_1@10.6.2.40'},
{if_rebalance,<0.32081.2694>,
{wait_checkpoint_persisted,50,3131}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}
in function ns_single_vbucket_mover:spawn_and_wait/1
in call from ns_single_vbucket_mover:mover_inner/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.32081.2694>,<0.18896.2646>]
messages: [{'EXIT',<0.32081.2694>,
{unexpected_exit,
{'EXIT',<0.20985.2736>,
{{wait_checkpoint_persisted_failed,"default",37,2959,
[{'ns_1@10.6.2.40',
{'EXIT',
{{badmatch,{error,timeout,
[{mc_client_binary,cmd_binary_vocal_recv,5}
,
{ns_memcached,ensure_bucket,2},
{ns_memcached,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['ns_memcached-default',
{wait_for_checkpoint_persistence,37,2959},
infinity]}},
{gen_server,call,
[{'janitor_agent-default','ns_1@10.6.2.40'},
{if_rebalance,<0.32081.2694>,
{wait_checkpoint_persisted,37,2959}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}}]
links: [<0.32081.2694>,<0.17284.2744>]
dictionary: [{cleanup_list,[<0.11946.2727>,<0.12020.2727>]}]
trap_exit: true
status: running
heap_size: 6765
stack_size: 24
reductions: 12015
neighbours:
[user:info,2012-11-06T5:42:13.903,ns_1@10.6.2.37:<0.14641.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.20985.2736>,
{{wait_checkpoint_persisted_failed,"default",
37,2959,
[{'ns_1@10.6.2.40',
{'EXIT',
{{badmatch,{error,timeout,
[{mc_client_binary, cmd_binary_vocal_recv,5},
{mc_client_binary,select_bucket,2}
,
{ns_memcached,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['ns_memcached-default',
{wait_for_checkpoint_persistence,37, 2959},
infinity]}},
{gen_server,call,
[{'janitor_agent-default', 'ns_1@10.6.2.40'},
{if_rebalance,<0.32081.2694>,
{wait_checkpoint_persisted,37,2959}},
infinity]}}}}]},
[{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}]}}}
[error_logger:error,2012-11-06T5:42:13.902,ns_1@10.6.2.37:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.32081.2694> terminating
** Last message in was {'EXIT',<0.20927.2736>,
{unexpected_exit,
{'EXIT',<0.20985.2736>,
{{wait_checkpoint_persisted_failed,"default",37,
2959,
[{'ns_1@10.6.2.40',
{'EXIT',
{{badmatch,{error,timeout,
[{mc_client_binary,cmd_binary_vocal_recv,5},
{mc_client_binary,select_bucket,2},
{ns_memcached,ensure_bucket,2}
,
,
,
]},
{gen_server,call,
['ns_memcached-default',
,
infinity]}},
{gen_server,call,
[
,
{if_rebalance,<0.32081.2694>,
{wait_checkpoint_persisted,37,2959}},
infinity]}}}}]},
[
]}}}}
-
- When Server state == {state,"default",<0.32082.2694>,
{dict,8,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]},
{{[['ns_1@10.6.2.40'|8]],
[],
[['ns_1@10.6.2.42'|3]],
[['ns_1@10.6.2.43'|3]],
- When Server state == {state,"default",<0.32082.2694>,
I will upload collect info later