Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-1243

When adding a third node to a cluster of two, not all buckets have a replica instance

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • 1.6.0 beta1
    • None
    • ns_server
    • None
    • Operating System: CentOS 5.x
      Platform: X64

    Description

      I have a script which iterates through a set of nodes provided to it. For each iteration, the new node is added to the pre-existing cluster after which cluster membership and metrics are checked to ensure they are correct and consistent. Among the checks that are preformed is to ensure that every bucket in the system has exactly one active instance and at least one replica instance.

      This script can successfully add a second node to the first to create a cluster of two, but when attempting to add a third node to a cluster of two, the script fails when verifying the buckets because some of the buckets in the system are missing a replica instance.

      In the snippet below, we see that the script claims that buckets 119,120,121,122 are not correctly replicating.

      [root@domU-12-31-36-00-45-D2 py]# python rebalance_check.py --nodes 10.253.11.144,10.253.30.47,10.253.85.204 --num_buckets 256
      validating buckets: PASS
      validating items: PASS
      Adding 10.253.30.47 to 10.253.11.144: SUCCEEDED
      Checking cluster membership...
      10.253.11.144: MATCH
      10.253.30.47: MATCH
      validating buckets: PASS
      validating items: PASS
      Adding 10.253.85.204 to 10.253.11.144: SUCCEEDED
      Checking cluster membership...
      10.253.11.144: MATCH
      10.253.30.47: MATCH
      10.253.85.204: MATCH
      validating buckets: FAIL
      Traceback (most recent call last):
      File "rebalance_check.py", line 183, in ?
      validate_buckets(cluster, num_buckets, replication_level)
      File "rebalance_check.py", line 123, in validate_buckets
      raise RuntimeError("The following buckets are not correctly replicated: %s" % (replica_mismatch))
      RuntimeError: The following buckets are not correctly replicated: set([120, 121, 122, 119])

      When I do a vbucketctl to see what the nodes in the system think of node 120, we see that we have one "active" instance and one "dead" instance but no "replica" instance

      [root@domU-12-31-36-00-45-D2 py]# python /root/TestSuite/ep-mgmt/vbucketctl.py $S2:11210 list | egrep 120
      vbucket 120 dead
      [root@domU-12-31-36-00-45-D2 py]# python /root/TestSuite/ep-mgmt/vbucketctl.py $S1:11210 list | egrep 120
      [root@domU-12-31-36-00-45-D2 py]# python /root/TestSuite/ep-mgmt/vbucketctl.py $S3:11210 list | egrep 120
      vbucket 120 active
      [root@domU-12-31-36-00-45-D2 py]#

      The script I am using to test this behavior is still being developed, so has not been pushed into carlin yet. I will do so soon and update this bug with the details of how to run this.

      Logs to be attached

      Attachments

        1. 144.out
          1.92 MB
          Eric Lambert
        2. 204.out
          381 kB
          Eric Lambert
        3. 47.out
          515 kB
          Eric Lambert
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sean@northscale.com Sean Lynch (Inactive)
            eric@northscale.com Eric Lambert (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty