Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59826

[Magma] - Rebalance out with 1 DGM bucket fails with "'Rebalance exited with reason {mover_crashed"

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 0
    • Yes
    • KV 2023-4

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i node.ini rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,autoCompactionDefined=true,kv_quota_percent=30,cbas_quota_percent=30,retry_get_process_num=1200 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_out,nodes_init=6,nodes_out=1,bucket_spec=magma_dgm.1_percent_dgm.5_node_3_replica_magma_512_single_bucket,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,data_load_spec=volume_test_load_1_percent_dgm_lower_ops,retry_get_process_num=400,GROUP=rebalance_set0'
      

      Steps to Repro
      1. Create a 6 node cluster.

      2023-11-26 13:31:55,063 | test  | INFO    | MainThread | [table_view:display:72] Cluster statistics
      +----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
      | Nodes          | Zone    | Services | CPU    | Mem_total | Mem_free  | Swap_mem_used       | Active / Replica  | Version / Config                |
      +----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
      | 172.23.107.121 | Group 1 | kv       | 0.3625 | 23.36 GiB | 22.05 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1852-enterprise / default |
      | 172.23.107.217 | Group 1 | kv       | 1.1749 | 23.36 GiB | 21.83 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1852-enterprise / default |
      | 172.23.107.222 | Group 1 | kv       | 2.4374 | 23.36 GiB | 21.69 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1852-enterprise / default |
      | 172.23.107.102 | Group 1 | kv       | 0.3749 | 23.36 GiB | 21.85 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1852-enterprise / default |
      | 172.23.107.99  | Group 1 | kv       | 4.8904 | 23.36 GiB | 21.79 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1852-enterprise / default |
      | 172.23.107.223 | Group 1 | kv       | 1.2624 | 23.36 GiB | 21.76 GiB | 0.0 Byte / 3.50 GiB | 0 / 0             | 7.6.0-1852-enterprise / default |
      +----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
      

      2. Create a bucket. Add scopes/collections/data and push it to 1 DGM.

      2023-11-26 14:24:19,900 | test  | INFO    | MainThread | [table_view:display:72] Bucket statistics
      +---------+-------------------+----------+------+----------+------------+-----+-----------+---------------------+------------+---------------+
      | Bucket  | Type / Storage    | Replicas | Rank | Vbuckets | Durability | TTL | Items     | RAM Quota / Used    | Disk Used  | ARR           |
      +---------+-------------------+----------+------+----------+------------+-----+-----------+---------------------+------------+---------------+
      | default | couchbase / magma | 3        | 0    | 1024     | none       | 0   | 131072000 | 3.00 GiB / 2.17 GiB | 212.96 GiB | 1.44164352417 |
      +---------+-------------------+----------+------+----------+------------+-----+-----------+---------------------+------------+---------------+
      

      3. Start CRUD operations on the bucket.
      4. Remove one node (172.23.107.121) and rebalance out.

      2023-11-26 14:24:25,950 | test  | INFO    | pool-13-thread-10 | [table_view:display:72] Rebalance Overview
      +----------------+---------+----------+---------------------------------+---------------+--------------+-----------------------+
      | Nodes          | Zone    | Services | Version / Config                | CPU           | Status       | Membership / Recovery |
      +----------------+---------+----------+---------------------------------+---------------+--------------+-----------------------+
      | 172.23.107.121 | Group 1 | kv       | 7.6.0-1852-enterprise / default | 4.20957904135 | --- OUT ---> | active / none         |
      | 172.23.107.217 | Group 1 | kv       | 7.6.0-1852-enterprise / default | 4.3499999959  | Cluster node | active / none         |
      | 172.23.107.222 | Group 1 | kv       | 7.6.0-1852-enterprise / default | 21.192119213  | Cluster node | active / none         |
      | 172.23.107.102 | Group 1 | kv       | 7.6.0-1852-enterprise / default | 33.1482593026 | Cluster node | active / none         |
      | 172.23.107.99  | Group 1 | kv       | 7.6.0-1852-enterprise / default | 5.70614123533 | Cluster node | active / none         |
      | 172.23.107.223 | Group 1 | kv       | 7.6.0-1852-enterprise / default | 24.015196955  | Cluster node | active / none         |
      +----------------+---------+----------+---------------------------------+---------------+--------------+-----------------------+
      
      

      logs from UI on 172.23.107.222

      2023-11-26 14:27:22,617 | test  | ERROR   | pool-13-thread-10 | [rest_client:print_UI_logs:2624] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.107.222', u'tstamp': 1701037635271L, u'shortText': u'message', u'serverTime': u'2023-11-26T14:27:15.271Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.4286.143>,\n                                {{{badmatch,\n                                   {error,\n                                    {setup_replications_failed,\n                                     [{\'ns_1@172.23.107.217\',\n                                       {errors,[{134,466}]}}]}}},\n                                  [{janitor_agent,handle_apply_vbucket_state,\n                                    2,\n                                    [{file,"src/janitor_agent.erl"},\n                                     {line,1068}]},\n                                   {janitor_agent,\n                                    apply_vbucket_states_worker_loop,0,\n                                    [{file,"src/janitor_agent.erl"},\n                                     {line,1057}]},\n                                   {proc_lib,init_p,3,\n                                    [{file,"proc_lib.erl"},{line,225}]}]},\n                                 {gen_server,call,\n                                  [{\'janitor_agent-default\',\n                                    \'ns_1@172.23.107.99\'},\n                                   {if_rebalance,<0.26524.141>,\n                                    {wait_dcp_data_move,\n                                     [\'ns_1@172.23.107.102\',\n                                      \'ns_1@172.23.107.217\',\n                                      \'ns_1@172.23.107.223\'],\n                                     898}},\n                                   infinity]}}}}}.\nRebalance Operation Id = 25817df9b5e9929ac42d60aeaf4b7ce2'}
      2023-11-26 14:27:22,618 | test  | ERROR   | pool-13-thread-10 | [rest_client:print_UI_logs:2624] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.107.222', u'tstamp': 1701037635221L, u'shortText': u'message', u'serverTime': u'2023-11-26T14:27:15.221Z', u'text': u'Worker <0.3138.143> (for action {move,{898,\n                                       [\'ns_1@172.23.107.99\',\n                                        \'ns_1@172.23.107.121\',\n                                        \'ns_1@172.23.107.217\',\n                                        \'ns_1@172.23.107.102\'],\n                                       [\'ns_1@172.23.107.99\',\n                                        \'ns_1@172.23.107.102\',\n                                        \'ns_1@172.23.107.217\',\n                                        \'ns_1@172.23.107.223\'],\n                                       []}}) exited with reason {unexpected_exit,\n                                                                 {\'EXIT\',\n                                                                  <0.4286.143>,\n                                                                  {{{badmatch,\n                                                                     {error,\n                                                                      {setup_replications_failed,\n                                                                       [{\'ns_1@172.23.107.217\',\n                                                                         {errors,\n                                                                          [{134,\n                                                                            466}]}}]}}},\n                                                                    [{janitor_agent,\n                                                                      handle_apply_vbucket_state,\n                                                                      2,\n                                                                      [{file,\n                                                                        "src/janitor_agent.erl"},\n                                                                       {line,\n                                                                        1068}]},\n                                                                     {janitor_agent,\n                                                                      apply_vbucket_states_worker_loop,\n                                                                      0,\n                                                                      [{file,\n                                                                        "src/janitor_agent.erl"},\n                                                                       {line,\n                                                                        1057}]},\n                                                                     {proc_lib,\n                                                                      init_p,\n                                                                      3,\n                                                                      [{file,\n                                                                        "proc_lib.erl"},\n                                                                       {line,\n                                                                        225}]}]},\n                                                                   {gen_server,\n                                                                    call,\n                                                                    [{\'janitor_agent-default\',\n                                                                      \'ns_1@172.23.107.99\'},\n                                                                     {if_rebalance,\n                                                                      <0.26524.141>,\n                                                                      {wait_dcp_data_move,\n                                                                       [\'ns_1@172.23.107.102\',\n                                                                        \'ns_1@172.23.107.217\',\n                                                                        \'ns_1@172.23.107.223\'],\n                                                                       898}},\n                                                                     infinity]}}}}'}
      

      cbcollect_info attached. Last successful run was on 7.6.0-1837.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty