Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5906

Rebalance stuck, after adding back a failed over node, with XDCR enabled

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None
    • Environment:
      2.0.0 Build 1416
      Unidirectional replication
      1024 vBuckets

      Description

      Started over with a pair of single node clusters (10.1.3.71 and 10.1.3.72), threads=4, expiration=10
      – Added a node (10.1.3.235) on to the source cluster, rebalance.

      • Fall in destination ops-per-sec towards the end of rebalance on the source`s side.
      • But picks up once rebalance actually finishes.
        • Failover a node on source cluster
      • Couldn`t add it back and rebalance for some reason, so removed it completely, rebalanced and then added another node again, but rebalance is now stuck (between 10.1.3.71 and 10.1.3.235).
      1. 10.1.3.235-8091-diag.txt.gz
        10.10 MB
        Abhinav Dangeti
      2. 10.1.3.71-8091-diag.txt.gz
        9.93 MB
        Abhinav Dangeti
      3. 10.1.3.72-8091-diag.txt.gz
        14.56 MB
        Abhinav Dangeti
      1. Screen Shot 2012-07-13 at 3.28.51 PM.png
        111 kB
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        jin Jin Lim (Inactive) added a comment -

        Hi Tony, it appears to be that you had marked this bug fixed last time. For better bug tracking, we would like to require bare minimum info about why any bug is being closed/resolved. Can you please elaborate more on why it is being closed? Not complete, cannot reproduce, fix review #s, etc? Afterwards please re-close this bug if appropriate. Thanks much!

        Show
        jin Jin Lim (Inactive) added a comment - Hi Tony, it appears to be that you had marked this bug fixed last time. For better bug tracking, we would like to require bare minimum info about why any bug is being closed/resolved. Can you please elaborate more on why it is being closed? Not complete, cannot reproduce, fix review #s, etc? Afterwards please re-close this bug if appropriate. Thanks much!
        Hide
        jin Jin Lim (Inactive) added a comment -

        Per bug scrubs, it appears to be there was commit for this issue here:
        Integrated in github-ns-server-2-0 #412 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/412/)
        MB-5906: have finite timeouts for tap names unregistering (Revision b4a28e7e41bc70eaaca711a17ff7947cb7d90fd4)

        Alk & Aliaksey can you please verify if this commit had addressed the issue or not? Thanks.

        Show
        jin Jin Lim (Inactive) added a comment - Per bug scrubs, it appears to be there was commit for this issue here: Integrated in github-ns-server-2-0 #412 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/412/ ) MB-5906 : have finite timeouts for tap names unregistering (Revision b4a28e7e41bc70eaaca711a17ff7947cb7d90fd4) Alk & Aliaksey can you please verify if this commit had addressed the issue or not? Thanks.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        As pointed out above this commit was workaround for misbehavior elsewhere.

        So this particular symptom is gone (worked around by ns_server), but apparently underlying issue is not fixed.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - As pointed out above this commit was workaround for misbehavior elsewhere. So this particular symptom is gone (worked around by ns_server), but apparently underlying issue is not fixed.
        Hide
        jin Jin Lim (Inactive) added a comment -

        Based Alk's comment above we will close the issue since the symptom is gone by the workaround. We will open a separate bug for probably 2.0.2 and continue investigate why/how memcached completely ignoring ns_server request(s) to deregister replica building tap names.

        Show
        jin Jin Lim (Inactive) added a comment - Based Alk's comment above we will close the issue since the symptom is gone by the workaround. We will open a separate bug for probably 2.0.2 and continue investigate why/how memcached completely ignoring ns_server request(s) to deregister replica building tap names.
        Hide
        jin Jin Lim (Inactive) added a comment -

        MB-7717 has been created to track the mecached timeout (ignoring ns_server requests) issue.

        Show
        jin Jin Lim (Inactive) added a comment - MB-7717 has been created to track the mecached timeout (ignoring ns_server requests) issue.

          People

          • Assignee:
            thuan Thuan Nguyen
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes