Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 2.0
Affects Version/s: 2.0
Component/s: ns_server
Security Level: Public
Labels:
None
Environment:

Hide
http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1942-rel.rpm.manifest.xml
http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1944-rel.setup.exe.manifest.xml

Ubuntu 12.04 LTS: EC2 nodes
64bit - 15GB RAM

Show
http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1942-rel.rpm.manifest.xml http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1944-rel.setup.exe.manifest.xml Ubuntu 12.04 LTS: EC2 nodes 64bit - 15GB RAM

Description

Scenario:

10 node cluster with build 1942
Rebalance out 5 nodes (completed successfully)
Cluster right now: 5 nodes
Add 5 nodes (with build 1944) and remove 3 nodes.
Hit rebalance.
Rebalance failed with reason:

Rebalance exited with reason {badmatch,
[{<0.26283.119>,
badmatch,{error,emfile,
[

{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}

{misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]}

- Tried rebalance again, but failed repetitively:

Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits, 1},{misc,try_with_maybe_ignorant_after,2}

{gen_server,terminate,6}

{proc_lib,init_p_do_apply,3}

]},
{gen_server,call,
[<0.11023.120>,

{shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}

,
infinity]}}

Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Noticed this on one of the nodes being rebalanced out:
Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.

So failed over the node and tried rebalancing, rebalancing still failed.

So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Abhi Dangeti

Reporter:: Abhi Dangeti

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Nov/12 8:15 PM

Updated:: 04/Feb/13 9:39 AM

Resolved:: 20/Nov/12 11:47 AM

Gerrit Reviews

There are no open Gerrit changes

Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster, reason possibly because: "Unable to listen" to one of the nodes that was being rebalanced out.

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty