Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
2.0
-
Security Level: Public
-
- 5:5 uni & bidirectional XDCR
- ec2 nodes with 15G RAM
- 12.04 Ubuntu LTS
- 400G disk space on each node
- http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1967-rel.deb.manifest.xml
Description
At the time of the rebalance failure:
+ 5 nodes rebalance in on each cluster
Cluster setup: c1:c2::10:10
biXDCR_bucket: c1 <---> c2
uniXDCR_src: c1 ---> c2 :uniXDCR_dest
Front end loads on c1 and c2 for biXDCR_bucket, and on c1 for uniXDCR_src.
c1: http://ec2-177-71-230-72.sa-east-1.compute.amazonaws.com:8091/
c2: http://ec2-175-41-186-167.ap-southeast-1.compute.amazonaws.com:8091/
On C1, Rebalance operation failed with this reason on the UI logs:
Rebalance exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@ec2-177-71-170-44.sa-east-1.compute.amazonaws.com',
{'EXIT',
{{timeout,
{gen_server,call,
['ns_memcached-biXDCR_bucket',
,
180000]}},
{gen_server,call,
[
,
{if_rebalance,<0.10136.88>,
{update_vbucket_state,544,replica,
undefined,undefined}},
infinity]}}}}]},
[
,
{ns_vbucket_mover, update_replication_post_move,3},
{ns_vbucket_mover,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
The second time, rebalance failed with the following UI log message:
Rebalance exited with reason {{timeout,
{gen_server,call,
['ns_memcached-biXDCR_bucket',
,
180000]}},
{gen_server,call,
[
,
{if_rebalance,<0.21090.114>,
{update_vbucket_state,849,active,paused,
undefined}},
infinity]}}
After giving it some time, the third rebalance did complete successfully.
Will attach the grabbed diags from one of the nodes at C1 in a bit.
Attachments
Issue Links
- relates to
-
MB-9636 Rebalance fails with reason : bulk_set_vbucket_state_failed
- Closed