Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: 1.0.0
Affects Version/s: 0.7.0
Component/s: operator
Labels:
None

Epic Link:
Persistent Volumes

Description

Discussed this with Simon Murray offline, raising a ticket so that it's written down somewhere and a decision can be made on it.

I am currently running a 3-node K8S cluster in Amazon, when I'm not using it I would like to turn off my instances.

When I turn these instances back on, the existing cluster definitions I have are not able to recover (as the reconcile loop is driven by failover/definition changes I guess).
The reason for this is that the underlying pods have died and do not start up again on restart.

Obviously the CouchbaseCluster definitions still exist, so the reconcile loop continues to retry endlessly until the definitions are deleted.

I realise that this is not a common case for production, but in development it seems very common for people to turn off their instances when not using them.

Do we need to have better handling of situations when all underlying pods have been destroyed?
Perhaps if they are all offline then recreate the cluster based on the definition with newly created pods?
This would be much more graceful than a user having to delete and recreate all of their definitions (which they may not have persisted anywhere).
I guess this is also another case for having some support for persistent storage/pods.

Attachments

Issue Links

duplicates

K8S-273 Recovery when all pods are down

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Tommie McAfee (Inactive)

Reporter:: Matt Carabine (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Mar/18 9:04 AM

Updated:: 29/May/18 8:56 AM

Resolved:: 29/May/18 8:56 AM

Gerrit Reviews

There are no open Gerrit changes

Operator cannot recover from total pod failure

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty