Details
-
Bug
-
Resolution: Duplicate
-
Major
-
0.7.0
-
None
Description
Discussed this with Simon Murray offline, raising a ticket so that it's written down somewhere and a decision can be made on it.
I am currently running a 3-node K8S cluster in Amazon, when I'm not using it I would like to turn off my instances.
When I turn these instances back on, the existing cluster definitions I have are not able to recover (as the reconcile loop is driven by failover/definition changes I guess).
The reason for this is that the underlying pods have died and do not start up again on restart.
Obviously the CouchbaseCluster definitions still exist, so the reconcile loop continues to retry endlessly until the definitions are deleted.
I realise that this is not a common case for production, but in development it seems very common for people to turn off their instances when not using them.
Do we need to have better handling of situations when all underlying pods have been destroyed?
Perhaps if they are all offline then recreate the cluster based on the definition with newly created pods?
This would be much more graceful than a user having to delete and recreate all of their definitions (which they may not have persisted anywhere).
I guess this is also another case for having some support for persistent storage/pods.
Attachments
Issue Links
- duplicates
-
K8S-273 Recovery when all pods are down
- Resolved