Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
7.1.0-2335
-
Untriaged
-
Centos 64-bit
-
1
-
Unknown
-
KV 2022-Feb
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.14648.ini GROUP=rebalance_with_collection_crud,rerun=False,get-cbcollect-info=True,upgrade_version=7.1.0-2335 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_swap_rebalance,data_load_stage=before,quota_percent=80,get-cbcollect-info=True,upgrade_version=7.1.0-2335,skip_validations=False,rerun=False,nodes_init=4,rebalance_moves_per_node=32,GROUP=rebalance_with_collection_crud,nodes_swap=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,scrape_interval=5,data_load_spec=volume_test_load_with_CRUD_on_collections'
|
Steps to Repro
1. Create a 4 node cluster.
2022-02-19 20:35:59,417 | test | INFO | MainThread | [table_view:display:72] Cluster statistics
|
+----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
| Node | Services | CPU_utilization | Mem_total | Mem_free | Swap_mem_used | Active / Replica | Version |
|
+----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
| 172.23.136.222 | kv | 81.8483333333 | 5.99 GiB | 3.78 GiB | 2.56 GiB / 7.62 GiB | 79833 / 237269 | 7.1.0-2335-enterprise |
|
| 172.23.136.225 | kv | 100 | 5.99 GiB | 3.89 GiB | 2.42 GiB / 7.62 GiB | 79064 / 223203 | 7.1.0-2335-enterprise |
|
| 172.23.136.226 | kv | 95.73 | 5.99 GiB | 3.90 GiB | 2.37 GiB / 7.62 GiB | 79829 / 236777 | 7.1.0-2335-enterprise |
|
| 172.23.136.224 | kv | 87.603539941 | 5.99 GiB | 3.90 GiB | 2.38 GiB / 7.62 GiB | 79487 / 237831 | 7.1.0-2335-enterprise |
|
+----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
2. Create buckets/scopes/collections/data
2022-02-19 20:37:06,631 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
|
+---------+-----------+-----------------+----------+------------+-----+--------+------------+------------+------------+-----+
|
| Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
|
+---------+-----------+-----------------+----------+------------+-----+--------+------------+------------+------------+-----+
|
| bucket1 | couchbase | couchstore | 3 | none | 0 | 3000 | 800.00 MiB | 189.14 MiB | 419.17 MiB | 100 |
|
| bucket2 | ephemeral | - | 3 | none | 0 | 3000 | 800.00 MiB | 386.45 MiB | 136.0 Byte | - |
|
| default | couchbase | couchstore | 3 | none | 0 | 380000 | 5.86 GiB | 491.36 MiB | 643.28 MiB | 100 |
|
+---------+-----------+-----------------+----------+------------+-----+--------+------------+------------+------------+-----+
|
3. Start CRUD on data + CRUD on collections.
4. Add 2 nodes(172.23.137.89 and 172.23.137.88), remove 2 nodes(172.23.136.225 and 172.23.136.222) and do a swap rebalance.
2022-02-19 20:38:00,403 | test | INFO | pool-3-thread-27 | [table_view:display:72] Rebalance Overview
|
+----------------+----------+-----------------------+----------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+----------+-----------------------+----------------+--------------+
|
| 172.23.136.222 | kv | 7.1.0-2335-enterprise | 58.5433333333 | --- OUT ---> |
|
| 172.23.137.89 | kv | 7.1.0-2335-enterprise | 5.04440950236 | Cluster node |
|
| 172.23.136.225 | kv | 7.1.0-2335-enterprise | 74.7391579719 | --- OUT ---> |
|
| 172.23.137.88 | kv | 7.1.0-2335-enterprise | 0.285004750079 | Cluster node |
|
| 172.23.136.226 | kv | 7.1.0-2335-enterprise | 44.79 | Cluster node |
|
| 172.23.136.224 | kv | 7.1.0-2335-enterprise | 59.4266666667 | Cluster node |
|
+----------------+----------+-----------------------+----------------+--------------+
|
This rebalance fails as shown below.
172.23.136.226
2022-02-19 20:39:52,378 | test | ERROR | pool-3-thread-27 | [rest_client:print_UI_logs:2831] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.136.226', u'tstamp': 1645331982931L, u'shortText': u'message', u'serverTime': u'2022-02-19T20:39:42.931Z', u'text': u"Rebalance exited with reason {mover_crashed,\n {noproc,\n {gen_server,call,\n [{'janitor_agent-bucket2',\n 'ns_1@172.23.137.88'},\n {if_rebalance,<0.16893.4>,initiate_indexing},\n infinity]}}}.\nRebalance Operation Id = 87ba8167d2c432a8eff93e101353f8ec"}
|
2022-02-19 20:39:52,380 | test | ERROR | pool-3-thread-27 | [rest_client:print_UI_logs:2831] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.136.226', u'tstamp': 1645331982793L, u'shortText': u'message', u'serverTime': u'2022-02-19T20:39:42.793Z', u'text': u"Worker <0.25696.6> (for action {move,{573,\n ['ns_1@172.23.136.225',\n 'ns_1@172.23.136.222',\n 'ns_1@172.23.136.226',\n 'ns_1@172.23.136.224'],\n ['ns_1@172.23.137.88',\n 'ns_1@172.23.137.89',\n 'ns_1@172.23.136.226',\n 'ns_1@172.23.136.224'],\n []}}) exited with reason {noproc,\n {gen_server,\n call,\n [{'janitor_agent-bucket2',\n 'ns_1@172.23.137.88'},\n {if_rebalance,\n <0.16893.4>,\n initiate_indexing},\n infinity]}}"}
|
cbcollect_info attached.