Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45943

[System Test Upgrade] Overlapping partition found in rebalance - Online upgrade using graceful failover + delta recovery + rebalance fails with GetTaskList timeout

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • No

    Description

      Steps to Repro
      See MB-45939.
      I retried that rebalance which worked fine.

      Then started one more set of graceful failvover + recovery + rebalance.
      ns_1@172.23.105.102 3:14:24 AM 27 Apr, 2021

      Starting rebalance, KeepNodes = ['ns_1@172.23.104.15','ns_1@172.23.104.214',
      'ns_1@172.23.104.232','ns_1@172.23.104.244',
      'ns_1@172.23.104.245','ns_1@172.23.105.102',
      'ns_1@172.23.105.109','ns_1@172.23.105.112',
      'ns_1@172.23.105.118','ns_1@172.23.105.164',
      'ns_1@172.23.105.206','ns_1@172.23.105.210',
      'ns_1@172.23.105.25','ns_1@172.23.105.29',
      'ns_1@172.23.105.62','ns_1@172.23.105.86',
      'ns_1@172.23.105.90','ns_1@172.23.105.93',
      'ns_1@172.23.106.117','ns_1@172.23.106.191',
      'ns_1@172.23.106.207','ns_1@172.23.106.225',
      'ns_1@172.23.106.232','ns_1@172.23.106.239',
      'ns_1@172.23.106.246','ns_1@172.23.106.37'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 9c19d8b4df3da0dde8a05c160b2a8aae
      

      This rebalance failed as shown below.
      ns_1@172.23.105.102 3:22:35 AM 27 Apr, 2021

      Rebalance exited with reason {service_rebalance_failed,index,
      {agent_died,<29698.9694.0>,
      {linked_process_died,<29698.1133.2>,
      {timeout,
      {gen_server,call,
      [<29698.10608.0>,
      {call,"ServiceAPI.GetTaskList",
      #Fun<json_rpc_connection.0.77329884>},
      60000]}}}}}.
      Rebalance Operation Id = 9c19d8b4df3da0dde8a05c160b2a8aae
      

      Could be a dup of MB-45919.
      Also note that the retry of this failed rebalance worked fine, so its not affecting testing per-se.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            kevin.cherkauer Kevin Cherkauer (Inactive)
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty