Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43418

Rebalance exited with reason quorum_lost

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • Major
    • None
    • Cheshire-Cat
    • qe
    • Couchbase Toy build 7.0.0-11918-enterprise
    • 1

    Description

      Summary
      In volume tests one of the steps is to Hard Failover a node (.250) and FullRecovery that node with data load in parallel. Rebalance after full recovery failed. 

      ie; step 15 here:
      https://hub.internal.couchbase.com/confluence/pages/viewpage.action?pageId=50135893

      Failover (no failure injected)
      2020-12-22 10:16:26,165 | test | INFO | MainThread | [Collections:wait_for_failover_or_assert:354] 1 nodes failed over as expected in 0.0579998493195 seconds

      recovery + Rebalance 
      2020-12-22 10:58:07,546 | test | INFO | pool-1-thread-12 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.105.175 kv Cluster node
      172.23.106.250 kv Cluster node
      172.23.106.236 kv Cluster node
      172.23.106.251 kv Cluster node
      172.23.106.233 kv Cluster node
      172.23.106.238 kv Cluster node

      ------------------------------------

       Rebalance failed at 14% of completion

      2020-12-22 11:21:10,594 | test  | INFO    | pool-1-thread-12 | [task:check:320] Rebalance - status: running, progress: 14.2090298895
      2020-12-22 11:21:15,785 | test  | ERROR   | pool-1-thread-12 | [rest_client:_rebalance_status_and_progress:1484] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'status': u'none'} - rebalance failed
      2020-12-22 11:21:15,828 | test  | INFO    | pool-1-thread-12 | [rest_client:print_UI_logs:2593] Latest logs from UI on 172.23.105.175:
      2020-12-22 11:21:15,828 | test  | ERROR   | pool-1-thread-12 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.105.175', u'tstamp': 1608664870576L, u'shortText': u'message', u'serverTime': u'2020-12-22T11:21:10.576Z', u'text': u'Rebalance exited with reason {{badmatch,\n                               {leader_activities_error,\n                                {default,rebalance},\n                                {quorum_lost,\n                                 {lease_lost,\'ns_1@172.23.106.251\'}}}},\n                              [{ns_rebalancer,rebalance,5,\n                                [{file,"src/ns_rebalancer.erl"},{line,477}]},\n                               {proc_lib,init_p_do_apply,3,\n                                [{file,"proc_lib.erl"},{line,249}]}]}.\nRebalance Operation Id = fdaca13bac1c014ee9774f8c66b408f6'}
      

      Job's console:
      http://qa.sc.couchbase.com/job/temp_durability_volume/756/consoleFull

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sumedh.basarkod Sumedh Basarkod (Inactive)
            sumedh.basarkod Sumedh Basarkod (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty