Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
7.6.0
-
Untriaged
-
0
-
Unknown
Description
The following test was done on Capella with 21 million docs out of which 1 million were vector documents.
An AWS cluster with ami: couchbase-cloud-server-7.6.0-2056-x86_64-v1.0.27
Initial service group configuration : all 6 services colocated on 3 nodes with 16 cores and 32 GB RAM.
Tried to scale to : 5 nodes with 8 cores and 32 GB RAM each with all services colocated.
The cluster went __ into an infinite rebalance state and the scaling is stuck at FTS since 13 hours.
Workload -
bucket : Magma bucket with 10GB of available RAM and 21 million(1 million vector docs) documents with a total size of 26GB
Service wise workload -
I had 110 fts indexes out of which 10 were vector indexes. 109 gsi indexes and 31 dataverses **
DD logs -
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-01-27T071305-ns_1%40svc-dqisea-node-007.zmhlrqrvzgek2dd.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-01-27T071305-ns_1%40svc-dqisea-node-008.zmhlrqrvzgek2dd.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-01-27T071305-ns_1%40svc-dqisea-node-009.zmhlrqrvzgek2dd.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-01-27T071305-ns_1%40svc-dqisea-node-010.zmhlrqrvzgek2dd.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-01-27T071305-ns_1%40svc-dqisea-node-011.zmhlrqrvzgek2dd.sandbox.nonprod-project-avengers.com.zip
cluster is live and can be found here -
https://ui.qe-9.sandbox.nonprod-project-avengers.com/database/settings/activity?oid=4f91031a-7d04-4965-aa06-2f9afc837093&pid=466e9a5b-fa4d-41fe-9dbe-efc61940fba2&dbid=a13b3eb1-9ec9-4869-a245-1c23a616097b
NOTE - unlike some of the previous vector search bugs, the cpu utlisation never crossed 80% threshold for any of the nodes.
although the RAM used by search was pretty high for most part of the test ~25 GiB