Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: 3.0
Affects Version/s: 2.0
Component/s: couchbase-bucket, ns_server
Security Level: Public
Labels:
- 2.0-release-notes
Environment:

Hide
- 5:5 uni & bidirectional XDCR
- ec2 nodes with 15G RAM
- 12.04 Ubuntu LTS
- 400G disk space on each node
- http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1967-rel.deb.manifest.xml

Show
- 5:5 uni & bidirectional XDCR - ec2 nodes with 15G RAM - 12.04 Ubuntu LTS - 400G disk space on each node - http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1967-rel.deb.manifest.xml

Description

At the time of the rebalance failure:

+ 5 nodes rebalance in on each cluster
Cluster setup: c1:c2::10:10
biXDCR_bucket: c1 <---> c2
uniXDCR_src: c1 ---> c2 :uniXDCR_dest
Front end loads on c1 and c2 for biXDCR_bucket, and on c1 for uniXDCR_src.
c1: http://ec2-177-71-230-72.sa-east-1.compute.amazonaws.com:8091/
c2: http://ec2-175-41-186-167.ap-southeast-1.compute.amazonaws.com:8091/

On C1, Rebalance operation failed with this reason on the UI logs:

Rebalance exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@ec2-177-71-170-44.sa-east-1.compute.amazonaws.com',
{'EXIT',
{{timeout,
{gen_server,call,
['ns_memcached-biXDCR_bucket',

{set_vbucket,544,replica}

,
180000]}},
{gen_server,call,
[

{'janitor_agent-biXDCR_bucket', 'ns_1@ec2-177-71-170-44.sa-east-1.compute.amazonaws.com'}

,
{if_rebalance,<0.10136.88>,
{update_vbucket_state,544,replica,
undefined,undefined}},
infinity]}}}}]},
[

{janitor_agent,bulk_set_vbucket_state,4}

{ns_vbucket_mover, update_replication_post_move,3}

{ns_vbucket_mover,handle_info,2}

{gen_server,handle_msg,5}

{proc_lib,init_p_do_apply,3}

]}

The second time, rebalance failed with the following UI log message:

Rebalance exited with reason {{timeout,
{gen_server,call,
['ns_memcached-biXDCR_bucket',

{set_vbucket,849,active}

,
180000]}},
{gen_server,call,
[

{'janitor_agent-biXDCR_bucket', 'ns_1@ec2-177-71-230-72.sa-east-1.compute.amazonaws.com'}

,
{if_rebalance,<0.21090.114>,
{update_vbucket_state,849,active,paused,
undefined}},
infinity]}}

After giving it some time, the third rebalance did complete successfully.

Will attach the grabbed diags from one of the nodes at C1 in a bit.

Attachments

Issue Links

relates to

MB-9636 Rebalance fails with reason : bulk_set_vbucket_state_failed

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Mike Wiederhold [X] (Inactive)

Reporter:: Abhi Dangeti

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Nov/12 11:18 AM

Updated:: 19/Aug/14 3:53 PM

Resolved:: 10/Apr/13 1:12 PM

Gerrit Reviews

There are no open Gerrit changes

Rebalance-in operation failed twice with "bulk_set_vbucket_state" failing with heavy front end load on an XDCR set up and with system in DGM (~65% resident ratio)

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty