Description
The failover and graceful-failover REST APIs don't provide a way to assert the state of the cluster. As such it's possible for a client to check the state of the cluster and see that all nodes are active and by the time the client invokes failover, one or more nodes are already failed over.
It is quite possible the client would change their mind about invoking failover if they knew that their understanding of the state of the cluster was out-of-date.
I think this can be solved relatively easily: we just add optional arguments to the APIs around cluster state. E.g.
- activeNodes: if specified, the set of active nodes must identically equal this list else failover fails
- inactiveFailed: same as above but checking the nodes the client believes to be failed over
- inactiveAdded: same as above but checking for freshly added nodes
Attachments
Issue Links
- relates to
-
K8S-3472 Operator needs more robust way to detect outcome of graceful failover
- Resolved