Description
Several clients have wrappers and automations to trigger node failover, where may trigger "unsafe" failover for the node by default.
However, even when the failover is "unsafe" the cluster manager is waiting for quorum for 2000ms before proceeding (and this is happening for each bucket in the cluster).
In an example with 4 buckets, for each bucket before the replicas get promoted it's waiting for quorum for 2000ms and then proceeding further.
This 2000ms wait time is causes node failover process to delay (and the duration will increase proportionally with n.of buckets).
We have recommended, on several occasions, to modify the unsafe_preconditions_timeout, but this value can currently only be set through the /diag/eval endpoint.
The request is to allow this tunable from another endpoint which can be executed remotely in order to ease some configuration scripting.
Attachments
Issue Links
- relates to
-
MB-40375 Hard/unsafe failover checks preconditions more than once
- Closed