Details
-
Bug
-
Resolution: Unresolved
-
Major
-
2.7.0
-
Couchbase Version : 7.6.0-2176
Kubernetes Version : v1.30.0
CAO and operator : 2.7.0 built locally
Environment : Kind cluster
-
19 - A Rock and a Hard Place
-
1
Description
Cluster Setup
- Kind cluster locally run on Mac
- 3 nodes with all services
- 1 bucket
Steps taken in the scenario
- Created a cluster
- On one pod, ran a bash script to kill memcached in a loop
- The node fails over in the cluster and delta recovery rebalances continuously fail as expected.
- Stopped the memcached kill loop
- The rebalance post this fails again and again due to a problem with eventing service.
The couchbase server issues are tracked under - MB-62725, MB-62724
Issue
- The rebalance fails due to timeouts with eventing service continuously in a loop for 2+ hours
- When rebalance is failing continuously with the same error, there should be a break point to stop the rebalance loop and operator should not attempt to retry rebalance again and again.
Cluster logs
https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0000.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0001.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0002.cb-example.default.svc.zip
Operator logs
https://cb-engineering.s3.amazonaws.com/MB-62724/cbopinfo-20240715T143931+0530.tar.gz
The cao tool and operator images were built locally on this commit
commit 127d1f23932294386bf0375be927758a8dee282c (HEAD -> master, origin/master, origin/HEAD)
|
Author: usamah jassat <usamah.jassat@couchbase.com> |
Date: Mon Jul 1 18:24:20 2024 +0100 K8S-3417: Allow rescheduling to different AZ |
|
Change-Id: I4194d211dabd7bb680a61930b5ac4d63ab4996f1
|
Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/212115 |
Reviewed-by: Justin Ashworth <justin.ashworth@couchbase.com> |
Tested-by: Build Bot <build@couchbase.com> |