Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-63074

Rebalance exited with reason {{badmatch, {leader_activities_error, {default,rebalance}

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Morpheus
    • ns_server
    •  Enterprise Edition 8.0.0 build 1763

    Description

      Steps to repro

      1. create a 4 data node cluster

      172.23.136.111  172.23.136.112  172.23.136.113  172.23.136.114

      2. create an ephemeral bucket default and load some data
      3. enable AFO with 

      Enabled auto-failover with timeout 1 and max count 1 and allowFailoverEphemeralNoReplica = true

      4. Restart node 172.23.136.112 to trigger AFO

      Failover completed successfully.
      Rebalance Operation Id = b5ee6dce9946a8e7e0ff82e1aaa483eb

      5. trigger rebalance where .114 is rebalanced out and .112(failed over) is ejected
      from .111

      [ns_server:info,2024-08-07T10:44:43.927-07:00,ns_1@172.23.136.111:<0.12099.0>:ns_orchestrator:idle:970]Starting rebalance, KeepNodes = ['ns_1@172.23.136.111','ns_1@172.23.136.113'], EjectNodes = ['ns_1@172.23.136.114'], Failed over and being ejected nodes = ['ns_1@172.23.136.112']; no delta recovery nodes; Operation Id = 9ef7e798834079a3e18560db8b3e152b
      [user:info,2024-08-07T10:44:43.928-07:00,ns_1@172.23.136.111:<0.12099.0>:ns_orchestrator:idle:973]Starting rebalance, KeepNodes = ['ns_1@172.23.136.111','ns_1@172.23.136.113'], EjectNodes = ['ns_1@172.23.136.114'], Failed over and being ejected nodes = ['ns_1@172.23.136.112']; no delta recovery nodes; Operation Id = 9ef7e798834079a3e18560db8b3e152b
      [rebalance:info,2024-08-07T10:44:43.929-07:00,ns_1@172.23.136.111:<0.35271.0>:ns_rebalancer:drop_old_2i_indexes:1395]Going to drop possible old 2i indexes on nodes []

      6. Rebalance exits

      [user:error,2024-08-07T10:45:01.697-07:00,ns_1@172.23.136.111:<0.12099.0>:ns_orchestrator:log_rebalance_completion:1704]Rebalance exited with reason {{badmatch,
                                     {leader_activities_error,
                                      {default,rebalance},
                                      {quorum_lost,
                                       {lease_lost,'ns_1@172.23.136.114'}}}},
                                    [{ns_rebalancer,rebalance,2,
                                      [{file,
                                        "/home/couchbase/jenkins/workspace/couchbase-server-unix/ns_server/apps/ns_server/src/ns_rebalancer.erl"},
                                       {line,496}]},
                                     {proc_lib,init_p_do_apply,3,
                                      [{file,"proc_lib.erl"},{line,241}]}]}.
      Rebalance Operation Id = 9ef7e798834079a3e18560db8b3e152b
      [ns_server:warn,2024-08-07T10:45:01.697-07:00,ns_1@172.23.136.111:users_replicator<0.8698.0>:doc_replicator:loop:110

      ------------------------------------------

      same test is working for 7.6.2
      here are logs for the working version
      from .111

      [ns_server:info,2024-08-07T11:22:16.586-07:00,ns_1@172.23.136.111:<0.7066.0>:ns_orchestrator:idle:927]Starting rebalance, KeepNodes = ['ns_1@172.23.136.111','ns_1@172.23.136.113'], EjectNodes = ['ns_1@172.23.136.114'], Failed over and being ejected nodes = ['ns_1@172.23.136.112']; no delta recovery nodes; Operation Id = 85b1d115d1a0339aabca08835ceeb212
      [user:info,2024-08-07T11:22:16.587-07:00,ns_1@172.23.136.111:<0.7066.0>:ns_orchestrator:idle:930]Starting rebalance, KeepNodes = ['ns_1@172.23.136.111','ns_1@172.23.136.113'], EjectNodes = ['ns_1@172.23.136.114'], Failed over and being ejected nodes = ['ns_1@172.23.136.112']; no delta recovery nodes; Operation Id = 85b1d115d1a0339aabca08835ceeb212
      

      [user:info,2024-08-07T11:22:35.305-07:00,ns_1@172.23.136.111:<0.7066.0>:ns_orchestrator:log_rebalance_completion:1661]Rebalance completed successfully.
      Rebalance Operation Id = 85b1d115d1a0339aabca08835ceeb212
      

      https://cb-engineering.s3.amazonaws.com/test/collectinfo-2024-08-07T182413-ns_1%40172.23.136.111.zip
      https://cb-engineering.s3.amazonaws.com/test/collectinfo-2024-08-07T182413-ns_1%40172.23.136.112.zip
      https://cb-engineering.s3.amazonaws.com/test/collectinfo-2024-08-07T182413-ns_1%40172.23.136.113.zip
      https://cb-engineering.s3.amazonaws.com/test/collectinfo-2024-08-07T182413-ns_1%40172.23.136.114.zip

      Script to repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_magma1.ini -t failover.AutoFailoverTests.AutoFailoverTests.test_autofailover_during_rebalance,timeout=1,num_node_failures=1,nodes_in=0,nodes_out=1,auto_reprovision=False,failover_action=restart_server,nodes_init=4,override_spec_params=replicas,replicas=0,bucket_spec=single_bucket.buckets_all_ephemeral_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,failover_ephemeral_no_replicas=True,wait_before_failure_induction=0,allow_ephemeral_failover_with_no_replicas=True'
      

      Attachments

        Activity

          People

            pulkit.matta Pulkit Matta
            pulkit.matta Pulkit Matta
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              PagerDuty