Details
-
Technical task
-
Resolution: Fixed
-
Blocker
-
2.0
-
Security Level: Public
-
centos 6.2 64bit build 2.0.0-1931
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- 24.8 GB RAM for couchbase server at each node
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1931
- Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1931-rel.rpm.manifest.xml
- Cluster has 2 buckets, default and saslbucket (12GB/each with 1 replica) and with 64 vbuckets setup.
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
10.6.2.37
10.6.2.38
10.6.2.44
10.6.2.45
10.6.2.42
10.6.2.43
- Load 20 million items to each bucket. Each key has size 1024 bytes
- After done loading, wait until initial index.
- After initial indexing done, mutate all items with size from 1024 to 1512 bytes.
- Queries all 4 views from 2 docs
- Add node 44 and rebalance. Passed
- Add node 45 and rebalance. Passed.
- Check auto failover is enable on cluster.
- Turn on firewall on node 40
iptables -A INPUT -p tcp -i eth0 --dport 1000:60000 -j REJECT
iptables -A OUTPUT -p tcp -o eth0 --sport 1000:60000 -j REJECT - Node 40 was down as expected.
- Auto failover kicked in after one minute.
- Disable firewall on node 40. Cluster saw node 40 up.
- Add node 40 back to cluster and rebalance. In few seconds, rebalance failed with error
[rebalance:error,2012-11-06T0:41:48.498,ns_1@10.6.2.37:<0.4077.2612>:ns_rebalancer:do_wait_buckets_shutdown:204]Failed to wait deletion of some buckets on some nodes: [{'ns_1@10.6.2.40',
{'EXIT',
}}]
[user:info,2012-11-06T0:41:48.500,ns_1@10.6.2.37:<0.14641.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {buckets_shutdown_wait_failed,
[{'ns_1@10.6.2.40',
{'EXIT',
}}]}
Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/8nodes-ci-1931-reb-failed-undelete-old-bucket-20121106-121536.tgz