Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7829

[system test]: swap rebalance hang in online upgrade from 1.8.1 to 2.0.1-168

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
    • Environment:
      ubuntu 11.04 64 bit

      Description

      Rebalance hang when doing swap rebalance by add node 2.0.1-168 to cluster and remove one 1.8.1 out of
      cluster.

      Environment:

      1.8.1 nodes

      1 = ec2-50-16-9-117.compute-1.amazonaws.com

      1. short public IP 50.16.9.117
      2. 10.149.28.28
        2 = ec2-54-242-127-186.compute-1.amazonaws.com
      3. short public IP 54.242.127.186
      4. 10.149.19.204
        3 = ec2-23-20-192-140.compute-1.amazonaws.com <== this node is remove to do 1st swap rebalance
      5. short public IP 23.20.192.140
      6. 10.149.27.66
        4 = ec2-204-236-252-180.compute-1.amazonaws.com
      7. short public IP 204.236.252.180
      8. 10.149.21.76

      Install couchbase server 2.0.1-168 in below node
      5 = ec2-50-16-32-181.compute-1.amazonaws.com <== 2.0.1-167 node added to do 1st swap rebalance for online upgrade

      1. short public IP 50.16.32.181
      2. 10.149.8.5

      Doing swap rebalance. Just in few minutes when I check vbuckets in node 66, I saw it is 0.

      Diags will be posted soon.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        thuan Thuan Nguyen added a comment - Diags files from all nodes https://s3.amazonaws.com/bugdb/jira/MB-7829/5ndoes-onupgrade-181_201-168_reb-hang-20130226-182430.tgz
        Hide
        jin Jin Lim (Inactive) added a comment -

        Test continued OK

        Show
        jin Jin Lim (Inactive) added a comment - Test continued OK
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        According to the logs the node that you were trying to add was already part of the cluster. It seems that you just reinstalled everything on that node. So the node itself believed that is was not part of any cluster. On the other hand, all the other nodes believed that it was part of their cluster. When a node gets added to the cluster, we just ensure that the node is not part of other cluster. Because of this you were able to add it second time to the same cluster. And this caused massive problems which (among other) resulted in stuck rebalance. I'm closing this as "won't fix" since this scenario is not something we really want to support. Please reopen if you can reproduce it using sane steps.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - According to the logs the node that you were trying to add was already part of the cluster. It seems that you just reinstalled everything on that node. So the node itself believed that is was not part of any cluster. On the other hand, all the other nodes believed that it was part of their cluster. When a node gets added to the cluster, we just ensure that the node is not part of other cluster. Because of this you were able to add it second time to the same cluster. And this caused massive problems which (among other) resulted in stuck rebalance. I'm closing this as "won't fix" since this scenario is not something we really want to support. Please reopen if you can reproduce it using sane steps.
        Hide
        thuan Thuan Nguyen added a comment -

        I saw rebalance moving later on. So I will close this bug

        Show
        thuan Thuan Nguyen added a comment - I saw rebalance moving later on. So I will close this bug

          People

          • Assignee:
            thuan Thuan Nguyen
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes