Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 5.0.0
Component/s: ns_server
Labels:

Description

It is currently possible to enable automatic failover of nodes running the Index service by toggling an internal setting. Given the improvements we are introducing with Replica Indexes and Rebalancing, should we allow this by default in a future release, without the need to use an internal setting?

The current logic will allow the failover of an Index node as long as there are two nodes running the service.

When customers are using Equivalent Indexes, we generally advise them to have multiple copies of Indexes for HA purposes - should we explicitly check that we are not removing the last instance of an Index before failing over the node?

If customers are using Replica Indexes, will we automatically instantiate a new replica copy at failover time?

Is there any difference in behaviour between Adhoc and Prepared Queries when an Index node is failed over?

We have a quota on the number of nodes that can be automatically failed over without some kind of intervention (currently one). Does this quota need to be extended to track/limit the number of nodes with different services on that have been failed over? Any reason why we can't failover (for example) one Data node and one Index node at the same time?

~~MB-12740~~ suggests that we should allow failing over as many nodes as we have bucket replicas - does the same logic apply to the number of Index Replicas?

PRD - https://docs.google.com/document/d/1QbQ3rWPPUHRsHj_Yf_0x7gYhrsP5KQSgDRl5jZy4_Wo/edit

Attachments

Issue Links

blocks

MB-44922 Enable auto failover for FTS service - FTS

Open

MB-33073 GSI: Enable Indexer service auto-failover on disk issues

Open

is cloned by

MB-44738 Support for auto-failover of Index Service - Indexer

Closed

relates to

MB-44738 Support for auto-failover of Index Service - Indexer

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Mihir Kamdar (Inactive)

Reporter:: Chris Malarky

Votes:: 1 Vote for this issue

Watchers:: 36 Start watching this issue

Dates

Created:: 28/Jun/17 8:52 AM

Updated:: 05/Aug/22 7:54 AM

Resolved:: 29/Oct/21 2:06 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 24 closed Gerrit changes

Hide There are 24 closed Gerrit changes

MB-44738 Part 1 (7.1.0 1368) Autofailover for Index Service feature: Gerrit Review:

MB-25061 api's for checking health of the services and safety of: Gerrit Review:

MB-25061 require Manager and AutofailoverManager to be implemented: Gerrit Review:

MB-25061 api's for checking the service health and asking the service if: Gerrit Review:

MB-25061 make health_monitor_sup to support services other than kv: Gerrit Review:

MB-25061 turn off logging for health check call: Gerrit Review:

MB-25061 prepare the code that handles counting disk failures for re-use: Gerrit Review:

MB-25061 prepare code that analyzes local only status for re-use: Gerrit Review:

MB-25061 introduce index monitor: Gerrit Review:

MB-25061 enable automatic failover for index nodes on Neo clusters by default: Gerrit Review:

MB-25061 move validate_autofailover out of ns_rebalancer: Gerrit Review:

MB-25061 pass nodes with UUID's to ns_orchestrator for autofailover attempt: Gerrit Review:

MB-25061 safety check for services: Gerrit Review:

MB-25061 move the auto failover emailing/logging into separate process: Gerrit Review:

MB-25061 log and email messages about nodes being excluded from auto failover: Gerrit Review:

MB-25061 correctly handle errors and timeouts in HealthCheck API: Gerrit Review:

MB-25061 do not perform health check if auto failover is disabled: Gerrit Review:

MB-25061 do not run health checks for services if they are colocated: Gerrit Review:

MB-25061 make it known inside of the failover code if failover is: Gerrit Review:

MB-25061 perform services safety check outside of orchestrator: Gerrit Review:

MB-25061 do not repeatedly send messages about failed service safe: Gerrit Review:

MB-25061 slightly streamline error handling for is_safe api: Gerrit Review:

MB-25061 do not crash index_monitor of the connection to service: Gerrit Review:

MB-25061 correctly pass allow_unsafe flag to failover:is_possible/2: Gerrit Review:

PagerDuty