Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: 2.7.0
Affects Version/s: 2.7.0
Component/s: operator
Labels:
- K8S-2.7.0
- Kubernetes
- k8s
- kubernetes
- operator
Environment:
Cluster version : 7.0.5-7659
Kubernetes Version : v1.30.0
CAO and operator : 2.7.0 built locally
Environment : Kind cluster

Sprint:
15 - First Frontier
Story Points:
1

Description

Cluster Setup

Kind cluster locally run on Mac
2 nodes with all services
1 bucket
Initial Cluster version : 7.0.5-7659

Steps taken in the scenario

Created a cluster
Created 1 bucket
Changed the cluster config to add a logging sidecar.
Swap rebalance is issued by the operator to reconcile the cluster to the changes
A new pod is added
Immediately rebalance is issued
Rebalance fails with not_all_nodes_are_ready_yet error - Tracked in K8S-3598
Rebalance is not re-tried but a new pod is created and added to cluster.
Rebalance fails again. A new pod is created and added to cluster
A pod was failed over. Without this fail over, there would be an infinite pod creation loop.

Another instance of the same : https://cb-engineering.s3.amazonaws.com/K8S-3598/cbopinfo-20240725T172847+0530.tar.gz

Here, I did not failover the problematic pod/node and with each rebalance failure, there's an infinite pod spin up.

Issue

Rebalance is not re-tried on failure during the first swap rebalance.
New pods are spun up infinitely for each failure instead of retry.
New pod that gets added to cluster is actually ejected immediately in the next rebalance. A pod is created and added to cluster and immediately ejected causing wastage of resources.

Operator logs :

https://cb-engineering.s3.amazonaws.com/K8S-3598/cbopinfo-20240725T170701+0530.tar.gz

Another instance of loop : https://cb-engineering.s3.amazonaws.com/K8S-3598/cbopinfo-20240725T172847+0530.tar.gz

Cluster logs :
https://cb-engineering.s3.amazonaws.com/K8S-3598/collectinfo-2024-07-25T114628-ns_1%40cb-example-0005.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3598/collectinfo-2024-07-25T114628-ns_1%40cb-example-0001.cb-example.default.svc.zip

The cao tool and operator images were built locally on this commit

commit c2e920ddbcfa9b4819d47ad81d0a35c359dd1dc6 (HEAD -> master, origin/master, origin/HEAD)

Author: usamah jassat <usamah.jassat@couchbase.com>

Date:   Wed Jul 17 15:11:19 2024 +0100    K8S-3581: don't attempt backend migration when rebalance required

    Change-Id: I2d2b6d6d4f8dbb0a30db5bd54a05631d17631eee

    Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/212890

    Reviewed-by: Yusuf Ramzan <yusuf.ramzan@couchbase.com>

    Tested-by: Build Bot <build@couchbase.com>

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2024-07-25-17-14-01-907.png
25/Jul/24 4:44 AM
407 kB
Raghav S K
image-2024-07-29-13-38-42-777.png
29/Jul/24 1:08 AM
574 kB
Raghav S K

Issue Links

duplicates

K8S-3590 [Upgrade] : Rebalance failures and Failover during upgrade causes infinite pod creation and addition to cluster

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: K8S-3599
#	Subject	Branch	Project	Status	CR	V
213642,2	K8S-3599: Ignore MB-45973 workaround when not version upgrade	2.7.x	couchbase-operator	Status: MERGED	+2	+1

Activity

People

Assignee:: Raghav S K

Reporter:: Raghav S K

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 25/Jul/24 4:49 AM

Updated:: 27/Aug/24 4:11 AM

Resolved:: 02/Aug/24 1:39 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

K8S-3599: Ignore MB-45973 workaround when not version upgrade: Gerrit Review:

Swap rebalance is not re-tried during failure and pods are created with every failure causing infinite pod creation

Details

Description

Cluster Setup

Steps taken in the scenario

Issue

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty