Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
2.7.0
-
Cluster version : 7.0.5-7659
Kubernetes Version : v1.30.0
CAO and operator : 2.7.0 built locally
Environment : Kind cluster
-
15 - First Frontier
-
1
Description
Cluster Setup
- Kind cluster locally run on Mac
- 2 nodes with all services
- 1 bucket
- Initial Cluster version : 7.0.5-7659
Steps taken in the scenario
- Created a cluster
- Created 1 bucket
- Changed the cluster config to add a logging sidecar.
- Swap rebalance is issued by the operator to reconcile the cluster to the changes
- A new pod is added
- Immediately rebalance is issued
- Rebalance fails with not_all_nodes_are_ready_yet error - Tracked in K8S-3598
- Rebalance is not re-tried but a new pod is created and added to cluster.
- Rebalance fails again. A new pod is created and added to cluster
- A pod was failed over. Without this fail over, there would be an infinite pod creation loop.
Another instance of the same : https://cb-engineering.s3.amazonaws.com/K8S-3598/cbopinfo-20240725T172847+0530.tar.gz
Here, I did not failover the problematic pod/node and with each rebalance failure, there's an infinite pod spin up.
Issue
- Rebalance is not re-tried on failure during the first swap rebalance.
- New pods are spun up infinitely for each failure instead of retry.
- New pod that gets added to cluster is actually ejected immediately in the next rebalance. A pod is created and added to cluster and immediately ejected causing wastage of resources.
Operator logs :
https://cb-engineering.s3.amazonaws.com/K8S-3598/cbopinfo-20240725T170701+0530.tar.gz
Another instance of loop : https://cb-engineering.s3.amazonaws.com/K8S-3598/cbopinfo-20240725T172847+0530.tar.gz
Cluster logs :
https://cb-engineering.s3.amazonaws.com/K8S-3598/collectinfo-2024-07-25T114628-ns_1%40cb-example-0005.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3598/collectinfo-2024-07-25T114628-ns_1%40cb-example-0001.cb-example.default.svc.zip
The cao tool and operator images were built locally on this commit
commit c2e920ddbcfa9b4819d47ad81d0a35c359dd1dc6 (HEAD -> master, origin/master, origin/HEAD)
|
Author: usamah jassat <usamah.jassat@couchbase.com> |
Date: Wed Jul 17 15:11:19 2024 +0100 K8S-3581: don't attempt backend migration when rebalance required |
|
Change-Id: I2d2b6d6d4f8dbb0a30db5bd54a05631d17631eee
|
Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/212890 |
Reviewed-by: Yusuf Ramzan <yusuf.ramzan@couchbase.com> |
Tested-by: Build Bot <build@couchbase.com> |
Attachments
Issue Links
- duplicates
-
K8S-3590 [Upgrade] : Rebalance failures and Failover during upgrade causes infinite pod creation and addition to cluster
- Closed
For Gerrit Dashboard: K8S-3599 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
213642,2 | K8S-3599: Ignore MB-45973 workaround when not version upgrade | 2.7.x | couchbase-operator | Status: MERGED | +2 | +1 |