Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58451

Concurrent auto failover failed with reason "{{case_clause,...failover,failover,2"

    XMLWordPrintable

Details

    Description

      Steps:

      • 9 node cluster

        +----------------+---------+----------+--------+-----------+----------+
        | Nodes          | Zone    | Services | CPU    | Mem_total | Mem_free |
        +----------------+---------+----------+--------+-----------+----------+
        | 172.23.108.70  | Group 1 | index    | 5.0460 | 4.03 GiB  | 3.15 GiB |
        | 172.23.108.69  | Group 1 | kv       | 2.9164 | 4.03 GiB  | 3.20 GiB |
        | 172.23.122.129 | Group 1 | n1ql     | 1.3095 | 4.03 GiB  | 2.94 GiB |
        | 172.23.108.74  | Group 1 | index    | 1.9270 | 4.03 GiB  | 3.08 GiB |
        | 172.23.122.171 | Group 1 | n1ql     | 1.4348 | 4.03 GiB  | 3.12 GiB |
        | 172.23.108.67  | Group 1 | kv       | 3.1867 | 4.03 GiB  | 3.11 GiB |
        | 172.23.108.72  | Group 1 | index    | 3.8847 | 4.03 GiB  | 3.08 GiB |
        | 172.23.108.68  | Group 1 | kv       | 2.8838 | 4.03 GiB  | 3.15 GiB |
        | 172.23.123.22  | Group 1 | n1ql     | 1.5180 | 4.03 GiB  | 3.08 GiB |
        +----------------+---------+----------+--------+-----------+----------+

      • 3 buckets

        +---------+-----------+---------+----------+--------+----------+------------+------------+
        | Bucket  | Type      | Storage | Replicas | Items  | Vbuckets | RAM Quota  | RAM Used   |
        +---------+-----------+---------+----------+--------+----------+------------+------------+
        | bucket1 | couchbase | magma   | 1        | 30000  | 1024     | 768.00 MiB | 241.94 MiB |
        | bucket2 | ephemeral | -       | 1        | 30000  | -        | 768.00 MiB | 54.97 MiB  |
        | default | couchbase | magma   | 1        | 500000 | 1024     | 768.00 MiB | 447.35 MiB |
        +---------+-----------+---------+----------+--------+----------+------------+------------+

      • Set auto-failover timeout=10
      • Induce failures on following nodes

        +----------------+----------+-------------+----------------+
        | Node           | Services | Node status | Failover type  |
        +----------------+----------+-------------+----------------+
        | 172.23.122.171 | n1ql     | active      | stop_couchbase |
        | 172.23.122.129 | n1ql     | active      | stop_couchbase |
        | 172.23.108.70  | index    | active      | stop_couchbase |
        | 172.23.108.67  | kv       | active      | stop_couchbase |
        +----------------+----------+-------------+----------------+

      Observation:

      Auto failover failed as soon as it gets initiated.

      Logs from 172.23.123.22

      [user:error,2023-08-30T06:55:15.226-07:00,ns_1@172.23.123.22:<0.9310.0>:ns_orchestrator:log_rebalance_completion:1594]Failover exited with reason {{case_clause,
                                       ['ns_1@172.23.108.67','ns_1@172.23.122.171',
                                        'ns_1@172.23.122.129','ns_1@172.23.108.70']},
                                   [{failover,failover,2,
                                        [{file,"src/failover.erl"},{line,257}]},
                                    {failover,config_sync_and_orchestrate,2,
                                        [{file,"src/failover.erl"},{line,186}]},
                                    {failover,orchestrate,2,
                                        [{file,"src/failover.erl"},{line,157}]},
                                    {async,'-async_init/4-fun-1-',3,
                                        [{file,"src/async.erl"},{line,199}]}]}.
      Rebalance Operation Id = a6869972bf66b31c763e9eff6a0b3bf6

      TAF testcase:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i node.ini -p get-cbcollect-info=False,rerun=False,skip_collections_cleanup=True,upgrade_version=7.6.0-1422 -t failover.concurrent_failovers.ConcurrentFailoverTests.test_concurrent_failover,nodes_init=9,services_init=kv-kv-kv-index-index-index-n1ql-n1ql-n1ql,maxCount=4,timeout=10,bucket_size=256,failover_order=kv:index:n1ql:n1ql,failover_method=stop_couchbase,bucket_spec=multi_bucket.buckets_with_similar_hierarchy,GROUP=P0'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty