Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2071

Deadlock Holiday...

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.2.0
    • None
    • operator
    • 10: Autoscaling, completion
    • 1

    Description

      See Manchester legends 10CC's Dreadlock Holiday for context!

      I found this through the chaotic monkey that is GKE Autopilot, it turns out that if the Operator deployment gets rescheduled while it's waiting for a pod to get scheduled (magnified quite a lot by cluster autoscaling!!) then we end up in a situation where:

      • We try to get a list of callable members
      • None are working so server throws a wobbly when we call /pools/default to determine what's clustered or not
      • Spin in loop of death forever

      We need a mechanism to "off" uninitialized nodes.  We version our resources, so we should be able to say a 2.2 pod without the pod.couchbase.com/initialized annotation can be "retired".  This can happen transparently without any special configuration.  It also preserves any pre 2.2 pods or initialized ones for log collection.

      Attachments

        For Gerrit Dashboard: K8S-2071
        # Subject Branch Project Status CR V

        Activity

          People

            simon.murray Simon Murray
            simon.murray Simon Murray
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty