Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Critical
Fix Version/s: 5.5.0
Affects Version/s: 3.0, 4.0.0
Component/s: ns_server
Security Level: Public
Labels:

Sprint:
Sprint 2 - March 11 - April 3

Description

We have a number of bugs due to erlang global facility or related issue of not being able to spawn new master quickly. I.e.:

~~MB-7282~~ (erlang's global naming facility apparently drops globally registered service with actual service still alive (was: impossible to change settings/autoFailover after rebalance))

~~MB-7168~~ [Doc'd 2.2.0] failover of node that's completely down is still not quick (was: Rebalance exited with reason {not_all_nodes_are_ready_yet after failover node)

~~MB-8682~~ start rebalance request is hunging sometimes (looks like another global facility issue)

~~MB-5622~~ Crash of master node may lead to autofailover in 2 minutes instead of configured shorter autofailover period or similarly slow manual failover

By getting us off global, we will fix all this issues.

Attachments

Issue Links

blocks

MB-12739 Improve Auto-failover for RZA

Resolved

is duplicated by

MB-7282 erlang's global naming facility apparently drops globally registered service with actual service still alive (was: impossible to change settings/autoFailover after rebalance)

Closed

MB-9691 rebalance repeated failed when add nodes back into cluster

Closed

MB-5622 Crash of master node may lead to autofailover in 2 minutes instead of configured shorter autofailover period or similarly slow manual failover

Closed

relates to

MB-9415 auto-failover in seconds - (reduced from minimum 30 seconds)

Resolved

MB-9691 rebalance repeated failed when add nodes back into cluster

Closed

MB-14967 Spend some time looking for a workaround for a problem with erlang global facility (was: [system test] Rebalance In fails with error " Request failed/" errors.)

Closed

MB-22807 Failover of node taking ~15 sec when the node down is orchestrator node and timeout is 5 sec

Closed

MB-11614 Discussion - Should we move auto-failover out of erlang?

Open

MB-9066 Increase Autofailover Counter: Enable setting the number of auto-failovers allowed on cluster

Resolved

(5 relates to)

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-9321
#	Subject	Branch	Project	Status	CR	V
86335,11	MB-9321 New global name registry.	master	ns_server	Status: MERGED	+2	+1
88124,17	MB-9321 Infrastructure for new orchestration.	master	ns_server	Status: MERGED	+2	+1
88644,19	MB-9321 Use leader_activities in ns_janitor.	master	ns_server	Status: MERGED	-1	+1
88645,19	MB-9321 Use leader_activities in service_janitor.	master	ns_server	Status: MERGED	-1	+1
88646,19	MB-9321 Use leader_activities for failover.	master	ns_server	Status: MERGED	-1	+1
88647,19	MB-9321 Use leader_activities for rebalance.	master	ns_server	Status: MERGED	+2	+1
88648,19	MB-9321 Use leader_activities for graceful failover.	master	ns_server	Status: MERGED	+2	+1
88660,22	MB-9321 Register recovery_server as leader activity.	master	ns_server	Status: MERGED	+2	+1