Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Critical
Fix Version/s: backlog
Affects Version/s: 5.5.0
Component/s: ns_server
Labels:
None

Description

Currently we have a somewhat miscellaneous collection of UI error messages that are served to users when the are performing hard failovers of a single node. E.g. if replications are relatively-up-to-date the message is:

Warning: Failing over the node will remove it from the cluster and activate a replica.
Operations currently in flight and not yet replicated, will be lost. Rebalancing will be
required to add the node back into the cluster. Consider using "Remove" and
rebalancing instead of Failover, to avoid any loss of data.

If the replications are behind (or are missing because node is down - probably the node to be failed over) we show the following message:

Attention: A significant amount of data stored on this node
does not yet have replica (backup) copies! Failing over the node now will
irrecoverably lose that data when the incomplete replica is
activated and this node is removed from the cluster. It is
recommended to use "Remove" and rebalance to
safely remove the node without any data loss.

There are, in addition, a different set of warnings for nodes that don't include the data service which get shown when that node is selected to be failed over.

None of these error messages are shown in the multi-node failure dialog and given it's prominent placement in the UI and the fact that we want people to use it when failing over multiple nodes, we should have a better set of warning and error messages for it.

Getting good error messages is complicated by the fact that the nodeStatuses REST API which is used by the UI to get the "failover safeness" information for each node returns information assuming just one node is failed over.

We need to redesign the protocol between the UI and the server. Perhaps we should add a checkSafety=true query parameter to the controller/failOver REST API and have it return safety information to the client.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Finlay

Reporter:: Dave Finlay

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 30/May/18 2:24 PM

Updated:: 24/May/21 9:50 AM

Gerrit Reviews

There are no open Gerrit changes

Improve the idea of "failover safeness" to better support multi-node failover (and associated UI error messages)

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty