Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-9321

Implement new cluster orchestration (was: Get us off erlang's global facility and re-elect failed master quickly and safely

    XMLWordPrintable

Details

    • Sprint 2 - March 11 - April 3

    Description

      We have a number of bugs due to erlang global facility or related issue of not being able to spawn new master quickly. I.e.:

      • MB-7282 (erlang's global naming facility apparently drops globally registered service with actual service still alive (was: impossible to change settings/autoFailover after rebalance))
      • MB-7168 [Doc'd 2.2.0] failover of node that's completely down is still not quick (was: Rebalance exited with reason {not_all_nodes_are_ready_yet after failover node)
      • MB-8682 start rebalance request is hunging sometimes (looks like another global facility issue)
      • MB-5622 Crash of master node may lead to autofailover in 2 minutes instead of configured shorter autofailover period or similarly slow manual failover

      By getting us off global, we will fix all this issues.

      Attachments

        Issue Links

          Activity

            People

              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              alkondratenko Aleksey Kondratenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty