Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52543

Server is crashing continuously with lease expired error on a rebalanced out node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • 7.1.1
    • ns_server
    • Enterprise Edition 7.1.1 build 3135

    Description

      QE TEST

      -test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
      

      Day - 3
      Cycle - 3
      Scale - 3

      STEPS

      1. Node 104.5 is rebalanced out from the cluster at sometime around 2022-06-13T12:19.

        2022-06-13T12:10:54.477-07:00, ns_orchestrator:0:info:message(ns_1@172.23.108.103) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.137','ns_1@172.23.104.155',
                                         'ns_1@172.23.104.157','ns_1@172.23.104.67',
                                         'ns_1@172.23.104.69','ns_1@172.23.104.70',
                                         'ns_1@172.23.105.107','ns_1@172.23.105.111',
                                         'ns_1@172.23.105.168','ns_1@172.23.106.100',
                                         'ns_1@172.23.106.188','ns_1@172.23.108.103',
                                         'ns_1@172.23.120.107','ns_1@172.23.120.245',
                                         'ns_1@172.23.121.117','ns_1@172.23.123.28',
                                         'ns_1@172.23.96.148','ns_1@172.23.96.192',
                                         'ns_1@172.23.96.251','ns_1@172.23.96.252',
                                         'ns_1@172.23.96.253','ns_1@172.23.97.119',
                                         'ns_1@172.23.97.121','ns_1@172.23.97.122',
                                         'ns_1@172.23.97.239','ns_1@172.23.99.11',
                                         'ns_1@172.23.99.25'], EjectNodes = ['ns_1@172.23.104.5'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = e4a957f9e572e5900975c7d0b529cc3d
        2022-06-13T12:19:10.929-07:00, ns_cluster:1:info:message(ns_1@172.23.104.5) - Node 'ns_1@172.23.104.5' is leaving cluster.
        2022-06-13T12:19:11.395-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.121.117) - Node 'ns_1@172.23.121.117' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
                                                                                           {shutdown,
                                                                                            transport_closed}}]
        2022-06-13T12:19:11.396-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.97.121) - Node 'ns_1@172.23.97.121' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
                                                                                          {shutdown,
                                                                                           transport_closed}}]
        2022-06-13T12:19:11.396-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.105.168) - Node 'ns_1@172.23.105.168' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
                                                                                           {shutdown,
                                                                                            transport_closed}}]
        2022-06-13T12:19:11.396-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.104.70) - Node 'ns_1@172.23.104.70' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
                                                                                          {shutdown,
                                                                                           transport_closed}}]
        2022-06-13T12:19:11.397-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.104.67) - Node 'ns_1@172.23.104.67' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
                                                                                          {shutdown,
                                                                                           transport_closed}}]
        2022-06-13T12:19:11.397-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.96.192) - Node 'ns_1@172.23.96.192' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
                                                                                          {shutdown,
                                                                                           transport_closed}}]
        2022-06-13T12:19:11.397-07:00, ns_orchestrator:0:info:message(ns_1@172.23.108.103) - Rebalance completed successfully.
        

      2. As soon as the node is rebalanced out from the cluster, it is continuously crashing with lease expired error few seconds after the lease was actually acquired.
      For eg -

      [ns_server:info,2022-06-13T12:32:13.115-07:00,ns_1@172.23.104.5:<0.10257.3>:leader_lease_acquire_worker:handle_fresh_lease_acquired:296]Acquired lease from node 'ns_1@172.23.104.5' (lease uuid: <<"df490a842356d9b36df06b97c4ecf60e">>)
      [ns_server:warn,2022-06-13T12:32:24.324-07:00,ns_1@172.23.104.5:leader_lease_agent<0.10248.3>:leader_lease_agent:handle_terminate:308]Terminating with reason shutdown when lease is expiring:
      {lease,
          {lease_holder,<<"df490a842356d9b36df06b97c4ecf60e">>,'ns_1@172.23.104.5'},
          -576460554683849027,-576460539683849027,
          {timer,undefined,
              {lease_expired,
                  {lease_holder,<<"df490a842356d9b36df06b97c4ecf60e">>,
                      'ns_1@172.23.104.5'}}},
          expiring}
      Removing the persisted lease.
      

      Pair of log statements mentioned above getting logged continuously from the time when node was rebalanced out from the cluster.

      3. Node addition fails when we try to add the node back to the cluster.

      2022-06-13T12:40:51.595-07:00, ns_cluster:5:info:message(ns_1@172.23.104.67) - Failed to add node 172.23.104.5:18091 to cluster. Failed to connect to https://172.23.104.5:18091. ok
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sujay.gad Sujay Gad
            sujay.gad Sujay Gad
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty