Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51435

[BP 6.6.6 MB-42968] - Eventing Enabled Cluster Fails to Recover - long DCP connection names

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.6.6
    • 6.6.5, Cheshire-Cat
    • eventing
    • Kubernetes 1.19, Operator 2.1
    • Untriaged
    • 1
    • Unknown

    Description

      What the test does

      Spins up a 3 node cluster, kills a pod, waits for recovery.  Does this N times.

      What happened

      The first pod is killed, the operator sees it go down, failover and we scale back up to 3 nodes.  Same for the second instance.  On the third attempt, the rebalance of the new node fails, and continues to do so until the end of time.  The nature of the failure is the cluster continuing to report an unbalanced status.

      Expectation

      When things report as balanced at the very least, it's safe to go around killing stuff and the cluster should be recoverable.  This is a deadlock situation for the Operator and Couchbase Cloud.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sujay.gad Sujay Gad
              jeelan.poola Jeelan Poola
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty