Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
3.0
-
Security Level: Public
-
centOS 64 bit each node - 5 cores 15GB
-
Untriaged
-
Unknown
Description
Scenario
--------------
1. Load on both clusters till vb_active_resident_items_ratio < 50 on standardbucket, <70 on standardbucket1, 20M on sasl.
2. Access phase with 100% gets runs for 3 hours which I expect would given the clusters enough time to sync up on replication.
3. Rebalance-out 1 node at cluster1
4. Rebalance-in 1 node at cluster1
5. Failover and remove node at cluster1
6. Failover and add-back node at cluster1
7. Rebalance-out 1 node at cluster2
8. Rebalance-in 1 node at cluster2
9. Failover and remove node at cluster2
10. Failover and add-back node at cluster2
11. Soft restart all nodes in cluster1 one by one
Problem
-------------
After phase 11, 2 nodes(.50 and .52) in cluster 1 remained in pending state. Warmup on all 3 buckets show complete on both nodes. Memcached isn't running on .45. There are no memcached cores in any of the nodes(ulimit -c set to unlimited). So memcached probably never started. Chiyoung confirmed that memcached had a clean shutdown on all buckets on .45.
Attaching cbcollect from cluster1.
Live cluster: http://172.23.105.44:8091/ . Based on the cause, pls advice if this could be a beta-blocker.
Attachments
For Gerrit Dashboard: MB-11608 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
39227,2 | MB-11608 Don't require halt() to complete outstanding IO. | master | ns_server | Status: MERGED | +2 | +1 |