Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45455

If auto-failover is impossible because of data loss we should stop trying to auto_failover the node

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • backlog
    • 6.6.0, Cheshire-Cat
    • ns_server

    Description

      If for some reason auto-failover fires but can't run to completion (say one of the buckets has zero replicas for instance), it will keep firing every second as long as the auto-failover condition is met. As a critical signal, auto-failover interrupts janitor which can be a problem if janitor is trying to do something that might cause the auto-failover condition to no longer be met (such as bringing a bucket on a node online.) In the case a bucket has no replicas, auto failover can never run to completion and this situation is perhaps best solved by disabling auto-failover (or adding a replica) however, in general we could perhaps handle this situation better.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              dfinlay Dave Finlay
              Abhijeeth.Nuthan Abhijeeth Nuthan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty