Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42968

Eventing Enabled Cluster Fails to Recover

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • Cheshire-Cat
    • 7.0.0
    • eventing
    • Kubernetes 1.19, Operator 2.1
    • Untriaged
    • 1
    • Unknown

    Description

      What the test does

      Spins up a 3 node cluster, kills a pod, waits for recovery.  Does this N times.

      What happened

      The first pod is killed, the operator sees it go down, failover and we scale back up to 3 nodes.  Same for the second instance.  On the third attempt, the rebalance of the new node fails, and continues to do so until the end of time.  The nature of the failure is the cluster continuing to report an unbalanced status.

      Expectation

      When things report as balanced at the very least, it's safe to go around killing stuff and the cluster should be recoverable.  This is a deadlock situation for the Operator and Couchbase Cloud.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-42968
          # Subject Branch Project Status CR V

          Activity

            simon.murray Simon Murray added a comment -

            I can if you put some container images onto Docker Hub for me

            simon.murray Simon Murray added a comment - I can if you put some container images onto Docker Hub for me
            simon.murray Simon Murray added a comment -

            D'oh!  Not the first time this has been seen and probably won't be the last.  Has anyone ever considered just hashing to a 64 character SHA256 string?  There's probably a technical reason I'm not privy to

            simon.murray Simon Murray added a comment - D'oh!  Not the first time this has been seen and probably won't be the last.  Has anyone ever considered just hashing to a 64 character SHA256 string?  There's probably a technical reason I'm not privy to
            simon.murray Simon Murray added a comment -

            That 11911 image still doesn't work.

            simon.murray Simon Murray added a comment - That 11911 image still doesn't work.

            Build couchbase-server-7.0.0-4131 contains eventing commit d38ace0 with commit message:
            MB-42968: Restrict DCP feed name length to 200 chars

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4131 contains eventing commit d38ace0 with commit message: MB-42968 : Restrict DCP feed name length to 200 chars

            Simon Murray, Can you please check if it's fixed? 

            vikas.chaudhary Vikas Chaudhary added a comment - Simon Murray , Can you please check if it's fixed? 
            simon.murray Simon Murray added a comment -

            Last time I tried 7.0.0-beta I didn't see anything bad with eventing, feel free to close.

            simon.murray Simon Murray added a comment - Last time I tried 7.0.0-beta I didn't see anything bad with eventing, feel free to close.

            People

              simon.murray Simon Murray
              simon.murray Simon Murray
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty