Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-192

Operator cannot recover from total pod failure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 1.0.0
    • 0.7.0
    • operator
    • None

    Description

      Discussed this with Simon Murray offline, raising a ticket so that it's written down somewhere and a decision can be made on it.

      I am currently running a 3-node K8S cluster in Amazon, when I'm not using it I would like to turn off my instances.

      When I turn these instances back on, the existing cluster definitions I have are not able to recover (as the reconcile loop is driven by failover/definition changes I guess).
      The reason for this is that the underlying pods have died and do not start up again on restart.

      Obviously the CouchbaseCluster definitions still exist, so the reconcile loop continues to retry endlessly until the definitions are deleted.

      I realise that this is not a common case for production, but in development it seems very common for people to turn off their instances when not using them.

      Do we need to have better handling of situations when all underlying pods have been destroyed?
      Perhaps if they are all offline then recreate the cluster based on the definition with newly created pods?
      This would be much more graceful than a user having to delete and recreate all of their definitions (which they may not have persisted anywhere).
      I guess this is also another case for having some support for persistent storage/pods.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              tommie Tommie McAfee (Inactive)
              matt.carabine Matt Carabine (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty