Description
Currently we have a somewhat miscellaneous collection of UI error messages that are served to users when the are performing hard failovers of a single node. E.g. if replications are relatively-up-to-date the message is:
Warning: Failing over the node will remove it from the cluster and activate a replica.
Operations currently in flight and not yet replicated, will be lost. Rebalancing will be
required to add the node back into the cluster. Consider using "Remove" and
rebalancing instead of Failover, to avoid any loss of data.
If the replications are behind (or are missing because node is down - probably the node to be failed over) we show the following message:
Attention: A significant amount of data stored on this node
does not yet have replica (backup) copies! Failing over the node now will
irrecoverably lose that data when the incomplete replica is
activated and this node is removed from the cluster. It is
recommended to use "Remove" and rebalance to
safely remove the node without any data loss.
There are, in addition, a different set of warnings for nodes that don't include the data service which get shown when that node is selected to be failed over.
None of these error messages are shown in the multi-node failure dialog and given it's prominent placement in the UI and the fact that we want people to use it when failing over multiple nodes, we should have a better set of warning and error messages for it.
Getting good error messages is complicated by the fact that the nodeStatuses REST API which is used by the UI to get the "failover safeness" information for each node returns information assuming just one node is failed over.
We need to redesign the protocol between the UI and the server. Perhaps we should add a checkSafety=true query parameter to the controller/failOver REST API and have it return safety information to the client.