Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42968

Eventing Enabled Cluster Fails to Recover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • eventing
    • Kubernetes 1.19, Operator 2.1
    • Untriaged
    • 1
    • Unknown

    Description

      What the test does

      Spins up a 3 node cluster, kills a pod, waits for recovery.  Does this N times.

      What happened

      The first pod is killed, the operator sees it go down, failover and we scale back up to 3 nodes.  Same for the second instance.  On the third attempt, the rebalance of the new node fails, and continues to do so until the end of time.  The nature of the failure is the cluster continuing to report an unbalanced status.

      Expectation

      When things report as balanced at the very least, it's safe to go around killing stuff and the cluster should be recoverable.  This is a deadlock situation for the Operator and Couchbase Cloud.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-42968
          # Subject Branch Project Status CR V

          Activity

            People

              simon.murray Simon Murray
              simon.murray Simon Murray
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty