Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
7.1.0
-
None
-
7.1.0 build 2325
-
Untriaged
-
Windows 64-bit
-
1
-
Yes
-
KV 2022-Feb
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_magma_win.ini rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,bucket_storage=magma,kv_quota_percent=30,services_init=kv:index:n1ql:fts:eventing:cbas-kv:index:n1ql:fts:eventing:cbas-kv:index:n1ql:fts:eventing:cbas-kv:index:n1ql:fts:eventing:cbas-kv:index:n1ql:fts:eventing:cbas-kv:index:n1ql:fts:eventing:cbas -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in,nodes_init=4,nodes_in=2,bucket_spec=magma_dgm.10_percent_dgm.4_node_1_replica_magma_512,doc_size=512,randomize_value=True,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,skip_validations=False,GROUP=rebalance_in'
|
Was running bunch of tests to validate MB-50941 when I noticed it.
Steps to Repro
1. Create a 4 node cluster.
|
2022-02-16 23:32:41,542 | test | INFO | MainThread | [table_view:display:72] Cluster statistics
|
+----------------+--------------------------------------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
| Node | Services | CPU_utilization | Mem_total | Mem_free | Swap_mem_used | Active / Replica | Version |
|
+----------------+--------------------------------------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
| 172.23.136.138 | cbas, eventing, fts, index, kv, n1ql | 5.33833333333 | 5.99 GiB | 3.36 GiB | 3.33 GiB / 7.62 GiB | 0 / 0 | 7.1.0-2325-enterprise |
|
| 172.23.136.142 | cbas, eventing, fts, index, kv, n1ql | 1.35497741704 | 5.99 GiB | 3.42 GiB | 3.27 GiB / 7.62 GiB | 0 / 0 | 7.1.0-2325-enterprise |
|
| 172.23.136.141 | cbas, eventing, fts, index, kv, n1ql | 4.97174952916 | 5.99 GiB | 3.27 GiB | 3.33 GiB / 7.62 GiB | 0 / 0 | 7.1.0-2325-enterprise |
|
| 172.23.136.137 | cbas, eventing, fts, index, kv, n1ql | 1.25 | 5.99 GiB | 3.34 GiB | 3.32 GiB / 7.62 GiB | 0 / 0 | 7.1.0-2325-enterprise |
|
+----------------+--------------------------------------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
2. Create Bucket/scopes/collections/Data
2022-02-16 23:36:56,727 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
|
+------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+---------------+
|
| Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
|
+------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+---------------+
|
| UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000 | couchbase | magma | 2 | none | 0 | 3000000 | 2.97 GiB | 2.04 GiB | 4.25 GiB | 27.5666666667 |
|
+------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+---------------+
|
3. Start CRUD on data + scopes/collections
4. Add 2 nodes(172.23.136.136 and 172.23.136.139) and rebalance in.
2022-02-16 23:37:15,099 | test | INFO | pool-3-thread-10 | [table_view:display:72] Rebalance Overview
|
+----------------+--------------------------------------+-----------------------+---------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+--------------------------------------+-----------------------+---------------+--------------+
|
| 172.23.136.138 | cbas, eventing, fts, index, kv, n1ql | 7.1.0-2325-enterprise | 15.5480741988 | Cluster node |
|
| 172.23.136.142 | cbas, eventing, fts, index, kv, n1ql | 7.1.0-2325-enterprise | 60.9883333333 | Cluster node |
|
| 172.23.136.141 | cbas, eventing, fts, index, kv, n1ql | 7.1.0-2325-enterprise | 11.0416666667 | Cluster node |
|
| 172.23.136.137 | cbas, eventing, fts, index, kv, n1ql | 7.1.0-2325-enterprise | 17.24 | Cluster node |
|
| 172.23.136.136 | None | | | <--- IN --- |
|
| 172.23.136.139 | None | | | <--- IN --- |
|
+----------------+--------------------------------------+-----------------------+---------------+--------------+
|
As can be seen from not a single vbucket movement/incoming docs/outgoing docs happen for first 15 mins. However it does start after 15 mins and finish at the 18 mins mark.
In all fairness, this might not be a bug. But our tests/automation frameworks currently use timeouts anywhere between 5 mins - 35 mins (depending on the amount of data) for rebalance to be considered a hang and fail the testcase. So, was wondering if something has changed recently that could cause this? If this issue occurs on a weekly run we could be looking at 100's of tc's failing as the default timeouts that we have wont work anymore.
cbcollect_info attached.
Note that the workload which is running here is somewhat contributing to the problem - we see that 78 collections are dropped just before the rebalance starts:
$ rg -N 'drop collection' memcached.log
2022-02-16T23:37:00.472471-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) drop collection:id:0x12, scope:0x0, manifest:0xd1
2022-02-16T23:37:00.628501-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) drop collection:id:0x29, scope:0x8, manifest:0xd3
2022-02-16T23:37:00.628544-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) drop collection:id:0x8, scope:0x0, manifest:0xd3
... cut ...
2022-02-16T23:37:14.341639-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) drop collection:id:0x82, scope:0xd, manifest:0x104
2022-02-16T23:37:14.341645-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) drop collection:id:0x80, scope:0xd, manifest:0x104
2022-02-16T23:37:14.341651-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) drop collection:id:0x7f, scope:0xd, manifest:0x104
If these all occurred at the same time then they would only trigger one compaction per vBucket; however they are spread out over multiple collection manifest updates - over 48 unique manifest IDs:
$ rg -N 'drop collection' memcached.log | cut -d, -f 3|sort| uniq | wc -l
48
As such we end up scheduling (and re-scheduling) compaction of each vBucket multiple times (to purge the items from the now-dropped collections). For example, for vb:792 we schedule 3 times:
$ rg 'Compaction of vb:792,' memcached.log
18120:2022-02-16T23:37:00.931734-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) Compaction of vb:792, task:2854, purge_before_ts:0, purge_before_seq:0, drop_deletes:false, internal:true created with delay:2147483647s (awaiting completion).
18852:2022-02-16T23:37:03.505150-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) Compaction of vb:792, task:2854, purge_before_ts:0, purge_before_seq:0, drop_deletes:false, internal:true rescheduled with delay:5s (awaiting completion).
21625:2022-02-16T23:38:16.418058-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) Compaction of vb:792, task:3885, purge_before_ts:0, purge_before_seq:0, drop_deletes:false, internal:true created with delay:2147483647s (awaiting completion).
We don't actually compact 3 times, just twice as two of the schedules were within 5s of each other (we delay starting compaction for purging collections by 5s in case more collections which need to be purged are identified) - so we "just" see 2 compactions:
$ rg 'Compaction of vb:792 done' memcached.log
21439:2022-02-16T23:37:54.906100-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) Compaction of vb:792 done (ok). purged tombstones:0, prepares:0, prepareBytes:0 collection_items_erased:alive:154,deleted:0, collections_erased:11, size/items/tombstones/purge_seqno pre{3391488, 2997, 0, 0}, post{1261568, 2843, 0, 0}
26751:2022-02-16T23:55:17.075566-08:00 INFO (UoQtU7JR99ysgk0PaDwPg8rWZom%U0eqtDTErgiiYijKCHCZBHE62BiI0Dr--41-860000) Compaction of vb:792 done (ok). purged tombstones:0, prepares:0, prepareBytes:0 collection_items_erased:alive:938,deleted:0, collections_erased:67, size/items/tombstones/purge_seqno pre{700416, 2695, 0, 0}, post{663552, 1757, 0, 0}
This isn't to downplay the underlying problem here (compaction is consuming all AuxIO threads), but just to highlight that the issue is amplified by this particular workload.