Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11448

KV+XDCR System Test : Rebalance after failover stuck, nodes go to pending state

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • 3.0
    • 3.0
    • XDCR
    • Security Level: Public
    • CentOS 6.x
      8 * 8 clusters, 1 bi-xdcr, 1 uni-xdcr. Each node : 15GB RAM, 419GB HDD for /data
    • Untriaged
    • Unknown

    Description

      Build
      --------
      3.0.0-819(xdcr on upr, internal replication on upr)

      Clusters
      -----------
      Source : http://172.23.105.44:8091/
      Destination : http://172.23.105.54:8091/
      The clusters are available to investigate.

      Steps
      --------
      1. Load on both clusters till vb_active_resident_items_ratio < 30.
      2. Access phase with 98% gets, 2%sets runs for 3 hours
      3. Rebalance-out 1 node at cluster1 with workload [high dgm ~4%]
      4. Rebalance-in the same node with workload
      5. Failover one node with workload. Rebalance to remove the node ==> rebalance stuck, 4 nodes go to pending state.

      Attached
      --------------
      Cbcollect info for source cluster. 172.23.105.52 was failed over and was getting rebalanced out.
      Rebalance did not progress despite pausing xdcr. Let me know if you need logs from remote cluster.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty