Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.5, Cheshire-Cat
-
Kubernetes 1.19, Operator 2.1
-
Untriaged
-
1
-
Unknown
Description
What the test does
Spins up a 3 node cluster, kills a pod, waits for recovery. Does this N times.
What happened
The first pod is killed, the operator sees it go down, failover and we scale back up to 3 nodes. Same for the second instance. On the third attempt, the rebalance of the new node fails, and continues to do so until the end of time. The nature of the failure is the cluster continuing to report an unbalanced status.
Expectation
When things report as balanced at the very least, it's safe to go around killing stuff and the cluster should be recoverable. This is a deadlock situation for the Operator and Couchbase Cloud.