Details
-
Bug
-
Resolution: Fixed
-
Critical
-
2.0
-
Security Level: Public
-
centos 6.2 64bit build 2.0.0-1862
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1862
- Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Enable consistent view on cluster (default)
- Change value of erlang in couchbase-server from +A 16 to +S 128:128
10.6.2.37
10.6.2.38
10.6.2.39
10.6.2.40
10.6.2.42
10.6.2.43
- Load 15 million items to each bucket. Each key has size from 512 bytes to 1024 bytes
- Queries all 4 views from 2 docs
- Mutate 15 million items with key size from 1500 to 1024 bytes
- Do swap rebalance, add node 44, 45 and remove node 39, 40
- Rebalance moves some items and hang in hours. Filed bug
MB-6953 - Try to stop rebalance but failed. Will re-open bug
MB-6707. - Stop couchbase server at node 37. Node 37 down but rebalance does not stop
- Go to node 38 and click stop rebalance. Rebalance stop. Then restart couchbase server on node 37.
- When node 37 up in a while, rebalance cluster again. Rebalance failed in few minutes with error:
Rebalance exited with reason {{{{badmatch,
{error,
[{capi_set_view_manager,handle_call,3}, {gen_server,handle_msg,5}, {gen_server,init_it,6}, {proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['capi_set_view_manager-saslbucket', {wait_index_updated,854},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket','ns_1@10.6.2.37'},
{if_rebalance,<0.8171.289>,
{wait_index_updated,513}},
infinity]}}
ns_orchestrator002
ns_1@10.6.2.38
22:52:21 - Wed Oct 17, 2012
Server error during processing: ["web request failed", {path, "/pools/default/buckets/default/statsDirectory"}, {type,exit},
{what,
{noproc,
{gen_server,call,
['capi_set_view_manager-default', {foreach_doc, #Fun<capi_ddoc_replication_srv.1.36030090>},
infinity]}}},
{trace,
[{gen_server,call,3}, {capi_ddoc_replication_srv, foreach_live_ddoc_id,2}, {capi_ddoc_replication_srv,fetch_ddoc_ids, 1}, {menelaus_stats, couchbase_view_stats_descriptions,1}, {menelaus_stats,membase_stats_description, 1}, {menelaus_stats,serve_stats_directory,3}, {menelaus_web_buckets, checking_bucket_access,4}, {menelaus_web,loop,3}]}]
menelaus_web019
ns_1@10.6.2.45
22:52:19 - Wed Oct 17, 2012
<0.8771.289> exited with {{{{badmatch,
{error,{error, <<"Partition 854 not in active nor passive set">>}
}},
[
,
{gen_server,handle_msg,5},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['capi_set_view_manager-saslbucket',
,
infinity]}},
{gen_server,call,
[
,
{if_rebalance,<0.8171.289>,
{wait_index_updated,513}},
infinity]}}
ns_vbucket_mover000
ns_1@10.6.2.38
22:52:10 - Wed Oct 17, 2012
- Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1862-rel.rpm.manifest.xml
- Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/8nodes-col-info-1862-reb-failed-Partition-not-in-active-nor-passive-set-20121017-233606.tgz
- This bug is similar with bug
MB-6490but it is marked as fixed