Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60777

Autofailover rebalance failure during concurrent afo case

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • 7.6.0
    • 7.6.0
    • ns_server
    • 7.6.0-2119-enterprise , debian

    Description

      Steps
      1. create a 6 node cluster

      +----------------+---------+----------+
      | Nodes          | Zone    | Services | 
      +----------------+---------+----------+
      | 172.23.107.44  | Group 1 | kv       | 
      | 172.23.216.146 | Group 1 | kv       |
      | 172.23.216.147 | Group 1 | kv       |
      | 172.23.216.68  | Group 1 | kv       |
      | 172.23.216.145 | Group 1 | kv       | 
      | 172.23.107.108 | Group 1 | kv       | 
      +----------------+---------+----------+

      2. create a magma bucket 'default' with 3 replicas with some data
      3. enable autofailover with timeout=30 and max_events=2
      4. stop couchbase on 172.23.107.108 to trigger autofailover
      5. then post  172.23.107.108 is auto-failed-over stop couchbase on 172.23.107.44 
      to trigger second afo.

      Observation
      Rebalance failure is observed 

      [ns_server:info,2024-02-13T18:46:03.892Z,ns_1@172.23.216.146:<0.25641.3>:compaction_daemon:spawn_scheduled_views_compactor:508]Start compaction of indexes for bucket default with config: 
      [{database_fragmentation_threshold,{30,undefined}},
       {view_fragmentation_threshold,{30,undefined}},
       {magma_fragmentation_percentage,50}]
      [ns_server:info,2024-02-13T18:46:07.035Z,ns_1@172.23.216.146:<0.28869.2>:ns_orchestrator:handle_event:670]Skipping janitor in state rebalancing
      [ns_server:info,2024-02-13T18:46:12.036Z,ns_1@172.23.216.146:<0.28869.2>:ns_orchestrator:handle_event:670]Skipping janitor in state rebalancing
      [ns_server:info,2024-02-13T18:46:13.186Z,ns_1@172.23.216.146:<0.25564.3>:ns_janitor:cleanup_with_membase_buckets_vbucket_map:238]Bucket "default" not yet ready on ['ns_1@172.23.107.44']
      [ns_server:info,2024-02-13T18:46:13.187Z,ns_1@172.23.216.146:rebalance_agent<0.28912.2>:rebalance_agent:handle_down:290]Rebalancer process <0.25175.3> died (reason {pre_rebalance_janitor_run_failed,
                                                   "default",
                                                   {error,
                                                    wait_for_memcached_failed,
                                                    ['ns_1@172.23.107.44']}}).
      [user:error,2024-02-13T18:46:13.188Z,ns_1@172.23.216.146:<0.28869.2>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {pre_rebalance_janitor_run_failed,"default",
                                       {error,wait_for_memcached_failed,
                                           ['ns_1@172.23.107.44']}}.
      Rebalance Operation Id = 3a9cf301945cf10bc49685e1d4b698eb
      

       

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pulkit.matta Pulkit Matta
            pulkit.matta Pulkit Matta
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty