Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49850

MultiNodeFailover: Non-kv node not failed over while failover is reported as unsafe for data service

    XMLWordPrintable

Details

    Description

      Build: 7.1.0-1787

      Scenario:

      • 8 node cluster with Data, Index, Query, backup services
      • Couchbase bucket with replicas=2
      • Max_failover_events configured = 10 with FO timeout=10seconds
      • Bring down 3 KV nodes and 1 index node

      Expected behavior:

      Index node should be failover since we have 2 index nodes in the cluster and KV node failover should not happen

      Actual behavior:

      No failover is triggered with the reason,

      Could not automatically fail over nodes (['ns_1@172.23.100.15',
      'ns_1@172.23.105.211',
      'ns_1@172.23.105.212',
      'ns_1@172.23.105.213']). Would lose vbuckets in the following buckets: ["travel-sample"]
      - auto_failover 000 - ns_1@172.23.105.155 - 3:44:33 AM 1 Dec, 2021

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          This is by design. The thinking behind this decision is that it doesn't make much sense to fail over services if KV is unwell and unfixable. Shivani Gupta, please confirm.

          artem Artem Stemkovski added a comment - This is by design. The thinking behind this decision is that it doesn't make much sense to fail over services if KV is unwell and unfixable. Shivani Gupta , please confirm.
          shivani.gupta Shivani Gupta added a comment - - edited

          Artem Stemkovski  is right. This is what we agreed on - to prioritize Data Service safety - if data is going to be lost, it is no point failing over other services.

          Ashwin Govindarajulu  I think you are expecting service level failover i.e. failover the index service but keep KV as is. But today we do not have service level failover. Once we do that feature, then you what you are expecting will be valid.

          shivani.gupta Shivani Gupta added a comment - - edited Artem Stemkovski   is right. This is what we agreed on - to prioritize Data Service safety - if data is going to be lost, it is no point failing over other services. Ashwin Govindarajulu   I think you are expecting service level failover i.e. failover the index service but keep KV as is. But today we do not have service level failover. Once we do that feature, then you what you are expecting will be valid.

          Thanks Artem Stemkovski / Shivani Gupta.

          Closing this ticket.

          ashwin.govindarajulu Ashwin Govindarajulu added a comment - Thanks Artem Stemkovski / Shivani Gupta . Closing this ticket.

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty