Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
7.1.0-2475
-
Untriaged
-
Centos 64-bit
-
1
-
Yes
-
KV March-22
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.54049.ini GROUP=rebalance_in_out_P0_set1,rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,autoCompactionDefined=true,get-cbcollect-info=True,infra_log_level=info,log_level=info,upgrade_version=7.1.0-2475 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in_out,nodes_init=5,nodes_in=2,nodes_out=1,update_replica=True,updated_num_replicas=3,bucket_spec=magma_dgm.5_percent_dgm.5_node_2_replica_magma_512,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,GROUP=rebalance_in_out_P0_set1'
|
Steps to Repro
1. Create a 5 node cluster
2022-03-12 06:05:05,697 | test | INFO | MainThread | [table_view:display:72] Cluster statistics
|
+----------------+----------+-----------------+-----------+-----------+---------------------+-------------------+-----------------------+
|
| Node | Services | CPU_utilization | Mem_total | Mem_free | Swap_mem_used | Active / Replica | Version |
|
+----------------+----------+-----------------+-----------+-----------+---------------------+-------------------+-----------------------+
|
| 172.23.106.163 | kv | 0.313008639038 | 11.45 GiB | 10.67 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.1.0-2475-enterprise |
|
| 172.23.105.36 | kv | 1.01669386218 | 11.45 GiB | 10.72 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.1.0-2475-enterprise |
|
| 172.23.105.33 | kv | 1.83624701295 | 11.45 GiB | 10.61 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.1.0-2475-enterprise |
|
| 172.23.107.164 | kv | 0 | 0.0 Byte | 0.0 Byte | 0.0 Byte / 0.0 Byte | 0 / 0 | 7.1.0-2475-enterprise |
|
| 172.23.105.37 | kv | 0.162886856284 | 11.45 GiB | 10.72 GiB | 0.0 Byte / 3.50 GiB | 0 / 0 | 7.1.0-2475-enterprise |
|
+----------------+----------+-----------------+-----------+-----------+---------------------+-------------------+-----------------------+
|
2. Create Bucket/scopes/collections/data
2022-03-12 06:13:31,703 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
|
+---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
|
| Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
|
+---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
|
| bucket1 | couchbase | couchstore | 2 | none | 0 | 50000 | 9.77 GiB | 237.54 MiB | 171.67 MiB | 100 |
|
| bucket2 | couchbase | magma | 2 | none | 0 | 50000 | 4.88 GiB | 510.53 MiB | 317.55 MiB | 100 |
|
| default | couchbase | magma | 2 | none | 0 | 32575000 | 2.50 GiB | 1.87 GiB | 35.97 GiB | 5.32952264006 |
|
+---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
|
3. Add 2 nodes(172.23.106.156,172.23.106.159), Remove 1 node(172.23.105.37), Update all bucket replicas to 3 and do a rebalance. Rebalance completes fine.
2022-03-12 06:13:43,536 | test | INFO | pool-7-thread-14 | [table_view:display:72] Rebalance Overview
|
+----------------+----------+-----------------------+---------------+--------------+-----------------------+
|
| Nodes | Services | Version | CPU | Status | Membership / Recovery |
|
+----------------+----------+-----------------------+---------------+--------------+-----------------------+
|
| 172.23.106.163 | kv | 7.1.0-2475-enterprise | 6.43863179074 | Cluster node | active / none |
|
| 172.23.105.36 | kv | 7.1.0-2475-enterprise | 7.10230856566 | Cluster node | active / none |
|
| 172.23.105.33 | kv | 7.1.0-2475-enterprise | 6.43982356648 | Cluster node | active / none |
|
| 172.23.107.164 | kv | 7.1.0-2475-enterprise | 5.58842039018 | Cluster node | active / none |
|
| 172.23.106.156 | kv | 7.1.0-2475-enterprise | 0 | Cluster node | inactiveAdded / none |
|
| 172.23.106.159 | kv | 7.1.0-2475-enterprise | 0 | Cluster node | inactiveAdded / none |
|
| 172.23.105.37 | kv | 7.1.0-2475-enterprise | 6.41590137124 | --- OUT ---> | active / none |
|
+----------------+----------+-----------------------+---------------+--------------+-----------------------+
|
Once rebalance completes we noticed following error message on 172.23.106.163.
On 172.23.106.163
2022-03-12 06:25:10,500 | test | CRITICAL | MainThread | [basetestcase:check_coredump_exist:933] 172.23.106.163: Found ' ERROR ' logs - ['2022-03-12T06:17:18.604112-08:00 ERROR 5726: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.163->ns_1@172.23.107.164:default - ActiveStream::processItems checkpoint_end:37081 should not be in the current snapshot range s:37022->e:37090\n', '2022-03-12T06:17:19.381800-08:00 ERROR 5726: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.163->ns_1@172.23.107.164:default - ActiveStream::processItems checkpoint_end:42608 should not be in the current snapshot range s:42539->e:42612\n']
|
cbcollect_info attached. This issue was not seen on 7.1.0-2434.
Attachments
For Gerrit Dashboard: MB-51414 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
172214,2 | MB-51414: Simplify handling of snapshot ranges in processItems() | neo | kv_engine | Status: MERGED | +2 | +1 |
172599,1 | Merge branch 'neo' | master | kv_engine | Status: ABANDONED | 0 | -1 |
172641,1 | Merge branch 'neo' into master | master | kv_engine | Status: MERGED | +2 | +1 |