Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 1.8.1
Affects Version/s: 1.8.1
Component/s: ns_server
Security Level: Public
Labels:
None
Environment:
Build 918
2 Buckets, bucket1, bucket2
1024 vbuckets
Each node : 24G ram.

Description

Setup
1.Cluster(nodes 94, 95) has 25.1M items per bucket.
2. Mutate items with much large value, causing fragmentation.
3. Node 94 is in heavy swap - 84%.
4. Resident ratio on node 95 has dropped to < 1 percent.
5. Restart node 95.
6. Issue rebalance - add node 97.
7. Stop rebalance

Seeing the following output messages

Port server moxi on node 'ns_1@10.3.2.97' exited with status 137. Restarting. Messages: 2012-06-20 16:12:14: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2012-06-20 16:12:14: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)

Port server memcached on node 'ns_1@10.3.2.97' exited with status 137. Restarting. Messages: TAP (Producer) eq_tapq:rebalance_483 - Clear the tap queues by force

On re-issuing rebalance, with remove node94. Rebalance fails with

Rebalance exited with reason

{wait_for_memcached_failed,"bucket2", ['ns_1@10.3.2.97']}

Attaching the logs from all the nodes https://s3.amazonaws.com/bugdb/jira/bug-cluster-swap/bug.tar

Attached the current screenshot

The live cluster can be accessed at http://10.3.2.94:8091/index.html#sec=log&serversTab=0