Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
7.6.0-1852
-
Untriaged
-
Centos 64-bit
-
0
-
Yes
-
KV 2023-4
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i node.ini rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,autoCompactionDefined=true,kv_quota_percent=30,cbas_quota_percent=30,retry_get_process_num=1200 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_out,nodes_init=6,nodes_out=1,bucket_spec=magma_dgm.1_percent_dgm.5_node_3_replica_magma_512_single_bucket,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,data_load_spec=volume_test_load_1_percent_dgm_lower_ops,retry_get_process_num=400,GROUP=rebalance_set0'
|
Steps to Repro
1. Create a 6 node cluster.
2023-11-26 13:31:55,063 | test | INFO | MainThread | [table_view:display:72] Cluster statistics
|
+----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
|
| Nodes | Zone | Services | CPU | Mem_total | Mem_free | Swap_mem_used | Active / Replica | Version / Config |
|
+----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
|
| 172.23.107.121 | Group 1 | kv | 0.3625 | 23.36 GiB | 22.05 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.6.0-1852-enterprise / default |
|
| 172.23.107.217 | Group 1 | kv | 1.1749 | 23.36 GiB | 21.83 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.6.0-1852-enterprise / default |
|
| 172.23.107.222 | Group 1 | kv | 2.4374 | 23.36 GiB | 21.69 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.6.0-1852-enterprise / default |
|
| 172.23.107.102 | Group 1 | kv | 0.3749 | 23.36 GiB | 21.85 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.6.0-1852-enterprise / default |
|
| 172.23.107.99 | Group 1 | kv | 4.8904 | 23.36 GiB | 21.79 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.6.0-1852-enterprise / default |
|
| 172.23.107.223 | Group 1 | kv | 1.2624 | 23.36 GiB | 21.76 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.6.0-1852-enterprise / default |
|
+----------------+---------+----------+--------+-----------+-----------+---------------------+-------------------+---------------------------------+
|
2. Create a bucket. Add scopes/collections/data and push it to 1 DGM.
2023-11-26 14:24:19,900 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
|
+---------+-------------------+----------+------+----------+------------+-----+-----------+---------------------+------------+---------------+
|
| Bucket | Type / Storage | Replicas | Rank | Vbuckets | Durability | TTL | Items | RAM Quota / Used | Disk Used | ARR |
|
+---------+-------------------+----------+------+----------+------------+-----+-----------+---------------------+------------+---------------+
|
| default | couchbase / magma | 3 | 0 | 1024 | none | 0 | 131072000 | 3.00 GiB / 2.17 GiB | 212.96 GiB | 1.44164352417 |
|
+---------+-------------------+----------+------+----------+------------+-----+-----------+---------------------+------------+---------------+
|
3. Start CRUD operations on the bucket.
4. Remove one node (172.23.107.121) and rebalance out.
2023-11-26 14:24:25,950 | test | INFO | pool-13-thread-10 | [table_view:display:72] Rebalance Overview
|
+----------------+---------+----------+---------------------------------+---------------+--------------+-----------------------+
|
| Nodes | Zone | Services | Version / Config | CPU | Status | Membership / Recovery |
|
+----------------+---------+----------+---------------------------------+---------------+--------------+-----------------------+
|
| 172.23.107.121 | Group 1 | kv | 7.6.0-1852-enterprise / default | 4.20957904135 | --- OUT ---> | active / none |
|
| 172.23.107.217 | Group 1 | kv | 7.6.0-1852-enterprise / default | 4.3499999959 | Cluster node | active / none |
|
| 172.23.107.222 | Group 1 | kv | 7.6.0-1852-enterprise / default | 21.192119213 | Cluster node | active / none |
|
| 172.23.107.102 | Group 1 | kv | 7.6.0-1852-enterprise / default | 33.1482593026 | Cluster node | active / none |
|
| 172.23.107.99 | Group 1 | kv | 7.6.0-1852-enterprise / default | 5.70614123533 | Cluster node | active / none |
|
| 172.23.107.223 | Group 1 | kv | 7.6.0-1852-enterprise / default | 24.015196955 | Cluster node | active / none |
|
+----------------+---------+----------+---------------------------------+---------------+--------------+-----------------------+
|
|
logs from UI on 172.23.107.222
2023-11-26 14:27:22,617 | test | ERROR | pool-13-thread-10 | [rest_client:print_UI_logs:2624] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.107.222', u'tstamp': 1701037635271L, u'shortText': u'message', u'serverTime': u'2023-11-26T14:27:15.271Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.4286.143>,\n {{{badmatch,\n {error,\n {setup_replications_failed,\n [{\'ns_1@172.23.107.217\',\n {errors,[{134,466}]}}]}}},\n [{janitor_agent,handle_apply_vbucket_state,\n 2,\n [{file,"src/janitor_agent.erl"},\n {line,1068}]},\n {janitor_agent,\n apply_vbucket_states_worker_loop,0,\n [{file,"src/janitor_agent.erl"},\n {line,1057}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,225}]}]},\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.107.99\'},\n {if_rebalance,<0.26524.141>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.107.102\',\n \'ns_1@172.23.107.217\',\n \'ns_1@172.23.107.223\'],\n 898}},\n infinity]}}}}}.\nRebalance Operation Id = 25817df9b5e9929ac42d60aeaf4b7ce2'}
|
2023-11-26 14:27:22,618 | test | ERROR | pool-13-thread-10 | [rest_client:print_UI_logs:2624] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.107.222', u'tstamp': 1701037635221L, u'shortText': u'message', u'serverTime': u'2023-11-26T14:27:15.221Z', u'text': u'Worker <0.3138.143> (for action {move,{898,\n [\'ns_1@172.23.107.99\',\n \'ns_1@172.23.107.121\',\n \'ns_1@172.23.107.217\',\n \'ns_1@172.23.107.102\'],\n [\'ns_1@172.23.107.99\',\n \'ns_1@172.23.107.102\',\n \'ns_1@172.23.107.217\',\n \'ns_1@172.23.107.223\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.4286.143>,\n {{{badmatch,\n {error,\n {setup_replications_failed,\n [{\'ns_1@172.23.107.217\',\n {errors,\n [{134,\n 466}]}}]}}},\n [{janitor_agent,\n handle_apply_vbucket_state,\n 2,\n [{file,\n "src/janitor_agent.erl"},\n {line,\n 1068}]},\n {janitor_agent,\n apply_vbucket_states_worker_loop,\n 0,\n [{file,\n "src/janitor_agent.erl"},\n {line,\n 1057}]},\n {proc_lib,\n init_p,\n 3,\n [{file,\n "proc_lib.erl"},\n {line,\n 225}]}]},\n {gen_server,\n call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.107.99\'},\n {if_rebalance,\n <0.26524.141>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.107.102\',\n \'ns_1@172.23.107.217\',\n \'ns_1@172.23.107.223\'],\n 898}},\n infinity]}}}}'}
|
cbcollect_info attached. Last successful run was on 7.6.0-1837.