Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-9321

Implement new cluster orchestration (was: Get us off erlang's global facility and re-elect failed master quickly and safely

    XMLWordPrintable

Details

    • Sprint 2 - March 11 - April 3

    Description

      We have a number of bugs due to erlang global facility or related issue of not being able to spawn new master quickly. I.e.:

      • MB-7282 (erlang's global naming facility apparently drops globally registered service with actual service still alive (was: impossible to change settings/autoFailover after rebalance))
      • MB-7168 [Doc'd 2.2.0] failover of node that's completely down is still not quick (was: Rebalance exited with reason {not_all_nodes_are_ready_yet after failover node)
      • MB-8682 start rebalance request is hunging sometimes (looks like another global facility issue)
      • MB-5622 Crash of master node may lead to autofailover in 2 minutes instead of configured shorter autofailover period or similarly slow manual failover

      By getting us off global, we will fix all this issues.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-9321
          # Subject Branch Project Status CR V

          Activity

            People

              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              alkondratenko Aleksey Kondratenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  PagerDuty