Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61079

Hard Failover with 'allow_unsafe' flag enabled, exited with no_leader

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • 7.6.2
    • 7.6.0
    • ns_server

    Description

      this is happening intermittently.
      steps to repro
      1. create a 4 node cluster

      172.23.105.191
      172.23.104.235
      172.23.104.234
      172.23.104.236

      2. stop couchbase service in 3 nodes 

      172.23.104.234
      172.23.104.236
      172.23.105.191

      3. wait for autofailover timeout , and as expected AFO did not get trigger.

      [ns_server:info,2024-03-10T23:56:21.204-07:00,ns_1@172.23.104.235:leader_registry<0.11245.200>:leader_registry:handle_down:286]Process <0.32271.200> registered as 'ns_rebalance_observer' terminated.
      [user:info,2024-03-10T23:56:21.204-07:00,ns_1@172.23.104.235:<0.26000.200>:auto_failover:report_failover_error:710]Could not automatically fail over nodes (['ns_1@172.23.104.234']). Could not contact majority of servers. Orchestration may be compromised.
      

      4. hard failing over all 3 nodes with allow safe = true from .235

      [ns_server:info,2024-03-10T23:58:06.248-07:00,ns_1@172.23.104.235:leader_registry<0.11245.200>:leader_registry:handle_down:286]Process <0.5797.201> registered as 'ns_rebalance_observer' terminated.
      [ns_server:info,2024-03-10T23:58:06.250-07:00,ns_1@172.23.104.235:<0.8161.201>:failover:restore_chronicle_quorum:122]Attempting quorum loss failover of = ['ns_1@172.23.104.234',
                                            'ns_1@172.23.104.236',
                                            'ns_1@172.23.105.191']
      [ns_server:info,2024-03-10T23:58:06.250-07:00,ns_1@172.23.104.235:<0.25276.200>:chronicle_master:do_handle_call:136]Starting quorum failover with opaque {#Ref<0.1041056013.2727608321.221361>,
                                            ['ns_1@172.23.104.234',
                                             'ns_1@172.23.104.236',
                                             'ns_1@172.23.105.191']}, keeping nodes ['ns_1@172.23.104.235']
      [chronicle:info,2024-03-10T23:58:06.252-07:00,ns_1@172.23.104.235:chronicle_leader<0.10670.200>:chronicle_leader:handle_new_history:531]History changed to <<"4d76aed31e0f1fba4206aa868d5a073f">>. Becoming an observer.
      [ns_server:error,2024-03-10T23:58:16.252-07:00,ns_1@172.23.104.235:<0.25276.200>:chronicle_master:do_handle_call:142]Unsuccesfull quorum loss failover. (no_leader).

       

       

      172.16.1.176 - Administrator/UI [10/Mar/2024:23:58:16 -0700] "POST /controller/startFailover HTTP/1.1" 500 34 "http://172.23.104.235:8091/ui/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36" 13000
      

       

       

       

       

       

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            neelima.premsankar Neelima Premsankar
            pulkit.matta Pulkit Matta
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty