Details
-
Improvement
-
Resolution: Fixed
-
Critical
-
3.0, 4.0.0
-
Security Level: Public
-
Sprint 2 - March 11 - April 3
Description
We have a number of bugs due to erlang global facility or related issue of not being able to spawn new master quickly. I.e.:
MB-7282(erlang's global naming facility apparently drops globally registered service with actual service still alive (was: impossible to change settings/autoFailover after rebalance))
MB-7168[Doc'd 2.2.0] failover of node that's completely down is still not quick (was: Rebalance exited with reason {not_all_nodes_are_ready_yet after failover node)
MB-8682start rebalance request is hunging sometimes (looks like another global facility issue)
MB-5622Crash of master node may lead to autofailover in 2 minutes instead of configured shorter autofailover period or similarly slow manual failover
By getting us off global, we will fix all this issues.
Attachments
Issue Links
- blocks
-
MB-12739 Improve Auto-failover for RZA
- Resolved
- is duplicated by
-
MB-7282 erlang's global naming facility apparently drops globally registered service with actual service still alive (was: impossible to change settings/autoFailover after rebalance)
- Closed
-
MB-9691 rebalance repeated failed when add nodes back into cluster
- Closed
-
MB-5622 Crash of master node may lead to autofailover in 2 minutes instead of configured shorter autofailover period or similarly slow manual failover
- Closed
- relates to
-
MB-9415 auto-failover in seconds - (reduced from minimum 30 seconds)
- Resolved
-
MB-9691 rebalance repeated failed when add nodes back into cluster
- Closed
-
MB-14967 Spend some time looking for a workaround for a problem with erlang global facility (was: [system test] Rebalance In fails with error " Request failed/" errors.)
- Closed
-
MB-22807 Failover of node taking ~15 sec when the node down is orchestrator node and timeout is 5 sec
- Closed
-
MB-11614 Discussion - Should we move auto-failover out of erlang?
- Open
-
MB-9066 Increase Autofailover Counter: Enable setting the number of auto-failovers allowed on cluster
- Resolved
For Gerrit Dashboard: MB-9321 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
86335,11 | MB-9321 New global name registry. | master | ns_server | Status: MERGED | +2 | +1 |
88124,17 | MB-9321 Infrastructure for new orchestration. | master | ns_server | Status: MERGED | +2 | +1 |
88644,19 | MB-9321 Use leader_activities in ns_janitor. | master | ns_server | Status: MERGED | -1 | +1 |
88645,19 | MB-9321 Use leader_activities in service_janitor. | master | ns_server | Status: MERGED | -1 | +1 |
88646,19 | MB-9321 Use leader_activities for failover. | master | ns_server | Status: MERGED | -1 | +1 |
88647,19 | MB-9321 Use leader_activities for rebalance. | master | ns_server | Status: MERGED | +2 | +1 |
88648,19 | MB-9321 Use leader_activities for graceful failover. | master | ns_server | Status: MERGED | +2 | +1 |
88660,22 | MB-9321 Register recovery_server as leader activity. | master | ns_server | Status: MERGED | +2 | +1 |