Details
-
Bug
-
Resolution: Fixed
-
None
-
None
-
Operating System: CentOS 5.x
Platform: X64
Description
I have a script which iterates through a set of nodes provided to it. For each iteration, the new node is added to the pre-existing cluster after which cluster membership and metrics are checked to ensure they are correct and consistent. Among the checks that are preformed is to ensure that every bucket in the system has exactly one active instance and at least one replica instance.
This script can successfully add a second node to the first to create a cluster of two, but when attempting to add a third node to a cluster of two, the script fails when verifying the buckets because some of the buckets in the system are missing a replica instance.
In the snippet below, we see that the script claims that buckets 119,120,121,122 are not correctly replicating.
[root@domU-12-31-36-00-45-D2 py]# python rebalance_check.py --nodes 10.253.11.144,10.253.30.47,10.253.85.204 --num_buckets 256
validating buckets: PASS
validating items: PASS
Adding 10.253.30.47 to 10.253.11.144: SUCCEEDED
Checking cluster membership...
10.253.11.144: MATCH
10.253.30.47: MATCH
validating buckets: PASS
validating items: PASS
Adding 10.253.85.204 to 10.253.11.144: SUCCEEDED
Checking cluster membership...
10.253.11.144: MATCH
10.253.30.47: MATCH
10.253.85.204: MATCH
validating buckets: FAIL
Traceback (most recent call last):
File "rebalance_check.py", line 183, in ?
validate_buckets(cluster, num_buckets, replication_level)
File "rebalance_check.py", line 123, in validate_buckets
raise RuntimeError("The following buckets are not correctly replicated: %s" % (replica_mismatch))
RuntimeError: The following buckets are not correctly replicated: set([120, 121, 122, 119])
When I do a vbucketctl to see what the nodes in the system think of node 120, we see that we have one "active" instance and one "dead" instance but no "replica" instance
[root@domU-12-31-36-00-45-D2 py]# python /root/TestSuite/ep-mgmt/vbucketctl.py $S2:11210 list | egrep 120
vbucket 120 dead
[root@domU-12-31-36-00-45-D2 py]# python /root/TestSuite/ep-mgmt/vbucketctl.py $S1:11210 list | egrep 120
[root@domU-12-31-36-00-45-D2 py]# python /root/TestSuite/ep-mgmt/vbucketctl.py $S3:11210 list | egrep 120
vbucket 120 active
[root@domU-12-31-36-00-45-D2 py]#
The script I am using to test this behavior is still being developed, so has not been pushed into carlin yet. I will do so soon and update this bug with the details of how to run this.
Logs to be attached