Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62061

Failover rebalance keeps getting triggered in a quorum loss scenario

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • 7.6.2
    • 7.6.2
    • ns_server
    • Enterprise Edition 7.6.2 build 3674

    Description

      steps
      1. create a 6 node cluster
      172.23.104.107, 172.23.104.235, 172.23.104.241, 172.23.104.250, 172.23.96.183, 172.23.96.197

      2. add some failover delay

      curl -k https://Administrator:password@localhost:18091/diag/eval -X POST -d 'testconditions:set(failover_end, {delay, 30000})'

      3. having autofailover timeout - 20 and max Events - 5

      4. bringing down 172.23.104.107 and failover starts

      [user:info,2024-05-27T09:31:10.346-07:00,ns_1@172.23.96.197:<0.20145.22>:ns_orchestrator:handle_start_failover:1863]Starting failover of nodes ['ns_1@172.23.104.107'] AllowUnsafe = false Operation Id = 71295d207d1c7738dc921afc3a5a84c9
      [ns_server:info,2024-05-27T09:31:10.346-07:00,ns_1@172.23.96.197:<0.23084.262>:failover:pre_failover_config_sync:223]Going to sync with chronicle quorum
      [ns_server:info,2024-05-27T09:31:10.565-07:00,ns_1@172.23.96.197:<0.23323.262>:ns_janitor:sanify_chain:670]Setting vbucket 0 in "bucket-0" on 'ns_1@172.23.104.250' from replica to active.
      [ns_server:info,2024-05-27T09:31:10.565-07:00,ns_1@172.23.96.197:<0.23323.262>:ns_janitor:sanify_chain:670]Setting vbucket 1 in "bucket-0" on 

      5. bringing down 3 more nodes simultaneously while current failover going on 172.23.104.235, 172.23.104.241, 172.23.96.183

      6.  .107 Autofailed over and  as expected server reports following  reason for subsequent failover not getting triggered 

      Could not automatically fail over nodes (['ns_1@172.23.96.183', 'ns_1@172.23.104.241', 'ns_1@172.23.104.235']). Could not contact majority of servers. Orchestration may be compromised.

       its stuck in a state where rebalance keeps getting triggered
      /pools/default/tasks from remaining nodes displays rebalance running with rebalance id keeps on updating

      [{"statusId":"3afbc40a79e60b03651a7f1e815bec09","type":"rebalance","subtype":"failover","recommendedRefreshPeriod":0.25,"status":"running","progress":0,"perNode":{},"detailedProgress":{},"stageInfo":{},"rebalanceId":"c0689618a0408efda88a456a608e5ffd","nodesInfo":{"active_nodes":["ns_1@172.23.104.235","ns_1@172.23.104.241","ns_1@172.23.104.250","ns_1@172.23.96.183","ns_1@172.23.96.197"],"failover_nodes":["ns_1@172.23.96.183","ns_1@172.23.104.241","ns_1@172.23.104.235"],"master_node":"ns_1@172.23.96.197"},"masterNode":"ns_1@172.23.96.197"}]

      [chronicle:info,2024-05-27T10:16:56.380-07:00,ns_1@172.23.96.197:chronicle_leader<0.18979.22>:chronicle_leader:handle_election_result:698]Election failed: {error,{no_quorum,['ns_1@172.23.104.250',                                     'ns_1@172.23.96.197'],                                    {6,'ns_1@172.23.96.197'}}} [chronicle:info,2024-05-27T10:16:56.695-07:00,ns_1@172.23.96.197:<0.23731.267>:chronicle_leader:do_election_worker:892]Starting election. History ID: <<"6b34e0cd8bb9a15d550f1b01bf2a0b53">> Log position: {{6,'ns_1@172.23.96.197'},11917} Peers: ['ns_1@172.23.104.250','ns_1@172.23.104.241','ns_1@172.23.96.183',         'ns_1@172.23.104.235','ns_1@172.23.96.197'] Required quorum: {majority,{set,5,16,16,8,80,48,                                 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],                                  []},                                 {{['ns_1@172.23.104.250'],                                   ['ns_1@172.23.104.241'],                                   [],                                   ['ns_1@172.23.96.183'],                                   [],                                   ['ns_1@172.23.104.235'],                                   [],                                   ['ns_1@172.23.96.197'],                                   [],[],[],[],[],[],[],[]}}}} [chronicle:info,2024-05-27T10:16:56.703-07:00,ns_1@172.23.96.197:chronicle_leader<0.18979.22>:chronicle_leader:handle_election_result:698]Election failed: {error,{no_quorum,['ns_1@172.23.104.250',                                     'ns_1@172.23.96.197'],                                    {6,'ns_1@172.23.96.197'}}} [ns_server:error,2024-05-27T10:16:58.126-07:00,ns_1@172.23.96.197:<0.20838.267>:leader_activities:report_error:944]Activity {default,failover} failed with error {no_quorum,                                                [{required_quorum,majority},                                                 {leases,                                                  ['ns_1@172.23.104.250',                                                   'ns_1@172.23.96.197']}]} [ns_server:info,2024-05-27T10:16:58.128-07:00,ns_1@172.23.96.197:leader_registry<0.19516.22>:leader_registry:handle_down:286]Process <0.23361.267> registered as 'ns_rebalance_observer' terminated. [chronicle:info,2024-05-27T10:16:58.413-07:00,ns_1@172.23.96.197:<0.23020.267>:chronicle_leader:do_election_worker:892]Starting election. History ID: <<"6b34e0cd8bb9a15d550f1b01bf2a0b53">> Log position: {{6,'ns_1@172.23.96.197'},11917} Peers: ['ns_1@172.23.104.250','ns_1@172.23.104.241','ns_1@172.23.96.183',         'ns_1@172.23.104.235','ns_1@172.23.96.197'] Required quorum: {majority,{set,5,16,16,8,80,48,                                 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
       
      
      

      also when I bring up nodes all 3 of them gets failed over .

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pulkit.matta Pulkit Matta
            pulkit.matta Pulkit Matta
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty