Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59089

[Magma] :- Swap rebalance + CRUD on collections/data hangs on 1 DGM bucket

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.6.0
    • 7.6.0
    • couchbase-bucket
    • None
    • 7.6.0-1624-enterprise
    • Untriaged
    • Centos 64-bit
    • 0
    • Yes
    • KV 2023-4

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i node.ini rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,autoCompactionDefined=true,kv_quota_percent=30,cbas_quota_percent=30,retry_get_process_num=1200 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_swap_rebalance,nodes_init=5,nodes_swap=1,bucket_spec=magma_dgm.1_percent_dgm.5_node_3_replica_magma_512_single_bucket,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,data_load_spec=volume_test_load_1_percent_dgm,retry_get_process_num=400,GROUP=rebalance_set1'
      

      Steps to Repro
      1. Create a 5 node cluster

      2023-10-11 09:09:18,881 | test  | INFO    | MainThread | [table_view:display:72] Cluster statistics
      +----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
      | Nodes          | Zone    | Services | CPU    | Mem_total | Mem_free  | Swap_mem_used       | Active / Replica  | Version / Config                |
      +----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
      | 172.23.107.217 | Group 1 | kv       | 0.5915 | 23.36 GiB | 21.93 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1624-enterprise / default |
      | 172.23.107.222 | Group 1 | kv       | 0.9710 | 23.36 GiB | 21.89 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1624-enterprise / default |
      | 172.23.107.102 | Group 1 | kv       | 0.3096 | 23.36 GiB | 21.93 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1624-enterprise / default |
      | 172.23.107.99  | Group 1 | kv       | 0.2897 | 23.36 GiB | 21.86 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1624-enterprise / default |
      | 172.23.107.223 | Group 1 | kv       | 1.3091 | 23.36 GiB | 21.86 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1624-enterprise / default |
      +----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
      

      2. Create a bucket/scopes/collections and push the bucket to 1 DGM.

      2023-10-11 10:11:24,355 | test  | INFO    | MainThread | [table_view:display:72] Bucket statistics
      +---------+-----------+---------+----------+------------+-----+-----------+----------+-----------+----------+------------+----------------+
      | Bucket  | Type      | Storage | Replicas | Durability | TTL | Items     | Vbuckets | RAM Quota | RAM Used | Disk Used  | ARR            |
      +---------+-----------+---------+----------+------------+-----+-----------+----------+-----------+----------+------------+----------------+
      | default | couchbase | magma   | 3        | none       | 0   | 131072000 | 1024     | 2.50 GiB  | 1.81 GiB | 215.60 GiB | 0.962097167969 |
      +---------+-----------+---------+----------+------------+-----+-----------+----------+-----------+----------+------------+----------------+
      

      3. Start data/scope/collections ops.
      4. Add node 172.23.107.121, remove 172.23.107.102 and do a swap rebalance.

      +----------------+---------+----------+---------------------------------+----------------+--------------+-----------------------+
      | Nodes          | Zone    | Services | Version / Config                | CPU            | Status       | Membership / Recovery |
      +----------------+---------+----------+---------------------------------+----------------+--------------+-----------------------+
      | 172.23.107.121 | Group 1 | kv       | 7.6.0-1624-enterprise / default | 0.294767308363 | Cluster node | inactiveAdded / none  |
      | 172.23.107.217 | Group 1 | kv       | 7.6.0-1624-enterprise / default | 5.11326250472  | Cluster node | active / none         |
      | 172.23.107.222 | Group 1 | kv       | 7.6.0-1624-enterprise / default | 5.95214885159  | Cluster node | active / none         |
      | 172.23.107.102 | Group 1 | kv       | 7.6.0-1624-enterprise / default | 4.11667465638  | --- OUT ---> | active / none         |
      | 172.23.107.99  | Group 1 | kv       | 7.6.0-1624-enterprise / default | 5.74157794947  | Cluster node | active / none         |
      | 172.23.107.223 | Group 1 | kv       | 7.6.0-1624-enterprise / default | 7.9736655648   | Cluster node | active / none         |
      +----------------+---------+----------+---------------------------------+----------------+--------------+-----------------------+
      

      This swap rebalance hangs. cbcollect_info attached. Test passes on the build 7.6.0-1606.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              owend Daniel Owen
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty