Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 2.2.0
Affects Version/s: None
Component/s: operator
Labels:
- releasenote

Sprint:
10: Autoscaling, completion
Story Points:
1

Description

See Manchester legends 10CC's Dreadlock Holiday for context!

I found this through the chaotic monkey that is GKE Autopilot, it turns out that if the Operator deployment gets rescheduled while it's waiting for a pod to get scheduled (magnified quite a lot by cluster autoscaling!!) then we end up in a situation where:

We try to get a list of callable members
None are working so server throws a wobbly when we call /pools/default to determine what's clustered or not
Spin in loop of death forever

We need a mechanism to "off" uninitialized nodes. We version our resources, so we should be able to say a 2.2 pod without the pod.couchbase.com/initialized annotation can be "retired". This can happen transparently without any special configuration. It also preserves any pre 2.2 pods or initialized ones for log collection.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Simon Murray

Reporter:: Simon Murray

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Mar/21 6:50 AM

Updated:: 03/Jun/21 2:19 AM

Resolved:: 16/Mar/21 7:10 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

K8S-2071: Deadlock Holiday: Gerrit Review:

Deadlock Holiday...

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty