Details
-
Bug
-
Resolution: Duplicate
-
Major
-
2.0
-
Security Level: Public
-
centos 6.2 64bit build 2.0.0-1746
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- 24.8 GB RAM for couchbase server at each node
- SSD disk format ext4 on /data
- Each server has its own drive, no disk sharing with other server.
- Cluster has 2 buckets, default (12GB) and saslbucket (12GB) and setup cluster with consistent enable.
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Create cluster with 6 nodes installed couchbase server 2.0.0-1746
10.6.2.37
10.6.2.38
10.6.2.39
10.6.2.40
10.6.2.42
10.6.2.43
- Load 28 million items to both bucket. Each key has size from 512 bytes to 1500 bytes
Add 2 nodes 10.6.2.44, 10.6.2.45 and remove 2 node 10.6.2.40, 10.6.2.43
Rebalance. Rebalance seems very slow. After 10 hours of running rebalance, I stop rebalance.
Restart rebalance again. Rebalance failed with error
Rebalance exited with reason {{{{badmatch,
{error,
}},
[
,
{gen_server,handle_msg,5},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},{gen_server,call,
['capi_set_view_manager-saslbucket', {wait_index_updated,67},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket','ns_1@10.6.2.42'},
{if_rebalance,<0.16335.137>,
{get_replication_persistence_checkpoint_id,
684}},
infinity]}}
Server error during processing: ["web request failed", {path,"/pools/default/buckets/default/ddocs"}, {type,exit},
{what,
{noproc,
{gen_server,call,
['capi_set_view_manager-default', {foreach_doc, #Fun<capi_ddoc_replication_srv.2.62853835>},
infinity]}}},
{trace,
[{gen_server,call,3}, {capi_ddoc_replication_srv, full_live_ddocs,1}, {capi_ddoc_replication_srv, sorted_full_live_ddocs,1}, {menelaus_web_buckets,handle_ddocs_list,3}, {menelaus_web_buckets, checking_bucket_access,4}, {menelaus_web,loop,3}, {mochiweb_http,headers,5},{proc_lib,init_p_do_apply,3}
]}]
Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201209/8nodes-col-info-1746-reb-failed-err-Partition_67_not_in_active_nor_passive_set-20120926-112422.tgz