Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62887

Leader can surrender mastership after a failover has started, if it doesn't yet have qurorum

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Morpheus
    • 7.6.1
    • ns_server
    • None
    • Untriaged
    • 0
    • Unknown

    Description

      While it is somewhat impractical to avoid any possibility of having a race between rebalance/failover and leader surrendering mastership, we should try to avoid the making the race more likely, where practical.

      The scenario that this ticket refers to is where we have nodes starting up gradually, causing something like the following:
      1. Node 1 comes online
      2. Node 1 sees Node 2 and 3 not yet online, so starts incrementing their down state, towards failover
      3. Node 1 decides to failover Nodes 2 and 3, but now needs to wait for quorum
      4. Node 2 comes online and requests that Node 1 surrenders its mastership (e.g. because Node 2 doesn't have KV and Node 1 does)
      5. Node 1 receives the request, and surrenders its mastership
      6. Node 1 now has quorum, so failover begins
      7. Failover gets terminated because Node 1 is no longer master

      We can avoid this specific scenario by setting the rebalance status during step 3. That way Node 2 would see that a rebalance is in progress and not request Node 1 to surrender its mastership.

      This doesn't entirely avoid race scenarios between rebalance and surrendering mastership. It is possible that Node 2 comes online before Node 1 decides to perform failover, and Node 1 starts performing failover prior to terminating the ns_orchestrator process, if there is enough time between surrendering mastership and terminating ns_orchestrator.
      However, it would still avoid the potential issues of surrendering mastership during rebalance, for certain cases.
       

      Seen in CBSE-17605

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Abhijeeth.Nuthan Abhijeeth Nuthan
            peter.searby Peter Searby
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty