Uploaded image for project: 'Couchbase Documentation'
  1. Couchbase Documentation
  2. DOC-6735

Failover docs should describe the concept of failover "unsafeness"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • 6.0.2, 6.5.1
    • Server 5.5/Vulcan, 6.0.2, 6.5 Milestone 1
    • admin
    • None
    • DOC-2020-S11-Jun14, DOC-2020-S12-Jun28
    • 1

    Description

      When leases were introduced to cluster orchestration in 5.5, they brought with them the possibility of an orchestrator action being blocked because the orchestrator can't get consent of a majority of the nodes to take the action.

      For things like rebalance, this just results in rebalance not starting or terminating because the lease is lost.

      For failover it's a different situation: it may be the case that you can't acquire leases from a majority of the nodes but the Administrator still needs failover to happen. E.g. say you have a 5 node with 2 nodes on one rack and 3 on another and the rack with 3 nodes goes up in flames. A user may want to fail over the 3 node majority and we should allow this failover to proceed so the user can recover their cluster. It's a diminished cluster - but absent the ability to run a failover when leases can't be acquired from a majority of nodes, the cluster would be forever unrecoverable and could only be used for spare parts.

      So, hard failover admits an "allowUnsafe" option which instructs the system to perform the failover even if leases are not acquired from a majority of nodes (an attempt is made to get the leases.) Admins should only use the allowUnsafe option if they are certain that the nodes being failed over are inaccessible and not coming back. This option should only be used in very rare situations such as the loss of a majority of nodes in the cluster.

      We need to update the docs with this concept and we need to also doc the couchbase-cli failover once it supports it.

      Note that in the UI if a user tries to failover half the nodes or more, we present them with the following dialog:

      This is just for reference. We could probably be a bit clearer in the UI too.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              tony.hillman Tony Hillman (Inactive)
              dfinlay Dave Finlay
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty