Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Not a Bug
-
Cheshire-Cat
-
centos7 64 bit; CB EE 7.0.0-5127
-
Untriaged
-
-
1
-
Unknown
Description
Steps to Reproduce:
1. Create a init cluster with 8 kv nodes and 10 fts nodes (total of 18 node init cluster)
+----------------+----------+-----------------------+---------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+----------+-----------------------+---------------+--------------+
|
| 172.23.105.175 | kv | 7.0.0-5127-enterprise | 1.11375297209 | Cluster node |
|
| 172.23.106.233 | ['kv'] | | | <--- IN --- |
|
| 172.23.106.236 | ['kv'] | | | <--- IN --- |
|
| 172.23.106.238 | ['kv'] | | | <--- IN --- |
|
| 172.23.106.250 | ['kv'] | | | <--- IN --- |
|
| 172.23.106.251 | ['kv'] | | | <--- IN --- |
|
| 172.23.121.74 | ['kv'] | | | <--- IN --- |
|
| 172.23.121.78 | ['kv'] | | | <--- IN --- |
|
| 172.23.107.43 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.58 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.44 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.45 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.54 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.47 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.78 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.84 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.85 | ['fts'] | | | <--- IN --- |
|
| 172.23.107.88 | ['fts'] | | | <--- IN --- |
|
+----------------+----------+-----------------------+---------------+--------------+
|
2. Create 15 Couchbase buckets, each bucket has 66 scopes, each scope has 1 collections. Each collection has 100 items
+-------------------------------+-----------+----------+------------+-----+-------+------------+-----------+-----------+
|
| Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
|
+-------------------------------+-----------+----------+------------+-----+-------+------------+-----------+-----------+
|
| 0SOXi1m-59-524000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140282736 | 94626452 |
|
| 3UIaV1M59btKMZAuKqa-59-531000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140497120 | 96884102 |
|
| 3_wu-59-516000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140269200 | 92234044 |
|
| 4bbj_BagLC0c-59-435000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140523024 | 98157575 |
|
| 8RqcdNTBrSlthPcW-59-404000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140438736 | 94876811 |
|
| Euje6l8r-59-472000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140507056 | 99141383 |
|
| LVPOp_6A-59-491000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140478208 | 99772034 |
|
| N6-59-462000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140304688 | 91587435 |
|
| OIKAR18m1TE4vnkl-59-450000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140315248 | 95236594 |
|
| SKx8L-59-421000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140357600 | 97739345 |
|
| g2YWv-59-500000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140354816 | 92845165 |
|
| m36Ir0JPz2yTE5XN--59-384000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140372880 | 95261388 |
|
| msUJk-59-538000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140431504 | 99952196 |
|
| r-Jwfz_CSeo5MYe%B-59-481000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 137281232 | 98446038 |
|
| x3is_DkH8-59-508000 | couchbase | 1 | none | 0 | 6600 | 2516582400 | 140351760 | 93582118 |
|
+-------------------------------+-----------+----------+------------+-----+-------+------------+-----------+-----------+
|
3. Create 100 fts indexes each with partitons=1 on 100 collections using rest api
2021-05-09 22:53:36,375 | test | INFO | MainThread | [Collections:create_fts_indexes:335] Creating 100 fts indexes
|
the last Rest api calls return at
2021-05-09 22:56:45,744 | test | INFO | MainThread | [Collections:test_volume_taf:643]
4. Do some crud operations on documents
5. Rebalance-in a fts node (172.23.107.107) (so almost 4 minutes from the last rest call to create fts index)
2021-05-09 22:57:04,427 | test | INFO | pool-2-thread-2 | [table_view:display:72] Rebalance Overview
|
This rebalance got hung at 42% for a long time .
Hardware and RAM quota
RAM quota for FTS: 22 GiB
Each node CPUs: 8
Each node RAM: ~24GiB
CPU, RAM usage, FTS indexes building progress (is at 100%) - all look healthy
So not sure of why the rebalance got hung
Screenshots and logs attached
Attachments
Issue Links
- is triggering
-
MB-46239 FTS - Investigate partition assignments in FTS
-
- Open
-
From the logs, it roughly takes around 2 min to process an index movement (even when no partition movement is involved) and
that extrapolates to 2 X 100 = 200 min and 3 hrs looks legit here. (partition movement also won't take much time as this is only a ~100 docs here per index)
Where that 2 min time goes into even when no partition movement is involved for many indexes?
=> Rebalance orchestrator still needs to evaluate the potential partition movement of every index.
=> Spin up a single supply and `N ` mover goroutines for all the nodes present in the cluster. (12 of them in this case.)
=> Ceremoniously close all these mover/supply routines upon finding no/after all partition movements.
So, it looks like there is no scope for further rebalance speed up optimizations here, and would expect the same time to be taken even in older builds too.