Loading...

XML

Word

Printable

Details

Type: Technical task
Resolution: Fixed
Priority: Blocker
Fix Version/s: 2.0.1
Affects Version/s: 2.0
Component/s: ns_server
Security Level: Public
Labels:
- system-test
Environment:
centos 6.2 64bit build 2.0.0-1931

Description

Cluster information:

8 centos 6.2 64bit server with 4 cores CPU
Each server has 32 GB RAM and 400 GB SSD disk.
24.8 GB RAM for couchbase server at each node
SSD disk format ext4 on /data
Each server has its own SSD drive, no disk sharing with other server.
Create cluster with 6 nodes installed couchbase server 2.0.0-1931
Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1931-rel.rpm.manifest.xml
Cluster has 2 buckets, default and saslbucket (12GB/each with 1 replica) and with 64 vbuckets setup.
Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)

10.6.2.37
10.6.2.38
10.6.2.44
10.6.2.45
10.6.2.42
10.6.2.43

Load 20 million items to each bucket. Each key has size 1024 bytes
After done loading, wait until initial index.
After initial indexing done, mutate all items with size from 1024 to 1512 bytes.
Queries all 4 views from 2 docs
Add node 44 and rebalance. Passed
Add node 45 and rebalance. Passed.
Check auto failover is enable on cluster.
Turn on firewall on node 40
iptables -A INPUT -p tcp -i eth0 --dport 1000:60000 -j REJECT
iptables -A OUTPUT -p tcp -o eth0 --sport 1000:60000 -j REJECT
Node 40 was down as expected.
Auto failover kicked in after one minute.
Disable firewall on node 40. Cluster saw node 40 up.
Add node 40 back to cluster and rebalance. In few seconds, rebalance failed with error

[rebalance:error,2012-11-06T0:41:48.498,ns_1@10.6.2.37:<0.4077.2612>:ns_rebalancer:do_wait_buckets_shutdown:204]Failed to wait deletion of some buckets on some nodes: [{'ns_1@10.6.2.40',
{'EXIT',

{old_buckets_shutdown_wait_failed, ["default"]}

}}]

[user:info,2012-11-06T0:41:48.500,ns_1@10.6.2.37:<0.14641.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {buckets_shutdown_wait_failed,
[{'ns_1@10.6.2.40',
{'EXIT',

{old_buckets_shutdown_wait_failed, ["default"]}

}}]}

Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/8nodes-ci-1931-reb-failed-undelete-old-bucket-20121106-121536.tgz

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Jin Lim (Inactive)

Reporter:: Thuan Nguyen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Nov/12 1:02 PM

Updated:: 25/Sep/17 11:53 AM

Resolved:: 05/Dec/12 3:50 PM

Gerrit Reviews

There are no open Gerrit changes

[system test] rebalance failed due to "Failed to wait deletion of some buckets on some nodes"

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty