Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2071

Deadlock Holiday...

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.2.0
    • None
    • operator
    • 10: Autoscaling, completion
    • 1

    Description

      See Manchester legends 10CC's Dreadlock Holiday for context!

      I found this through the chaotic monkey that is GKE Autopilot, it turns out that if the Operator deployment gets rescheduled while it's waiting for a pod to get scheduled (magnified quite a lot by cluster autoscaling!!) then we end up in a situation where:

      • We try to get a list of callable members
      • None are working so server throws a wobbly when we call /pools/default to determine what's clustered or not
      • Spin in loop of death forever

      We need a mechanism to "off" uninitialized nodes.  We version our resources, so we should be able to say a 2.2 pod without the pod.couchbase.com/initialized annotation can be "retired".  This can happen transparently without any special configuration.  It also preserves any pre 2.2 pods or initialized ones for log collection.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            simon.murray Simon Murray
            simon.murray Simon Murray
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty