Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Untriaged
-
0
-
No
Description
Test environment:
- 5 dedicated fts nodes with 16cpus and 64GiB RAM each node
- 60Million document on KV
- 60Million documents have vectors of dimension 1536
Test Steps:
- Loaded 60Mil documents on to KV
- Created indexing with 18 partitions per node
- I have only one index with l2_norm
- Indexing took 1hour 17mins for 60Million documents without any OOM but memory almost went up to 99% on few nodes.
- Left the cluster idle for nearly 10+ hours to see usage comes down, but never came down.
Below logs will will help understanding the behaviour using promtimer. Also attached images below which confirms the behaviour
- https://cb-engineering.s3.amazonaws.com/logs-after-8hours-idle-system/collectinfo-2024-02-19T010906-ns_1%40svc-s-node-007.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/logs-after-8hours-idle-system/collectinfo-2024-02-19T010906-ns_1%40svc-s-node-008.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/logs-after-8hours-idle-system/collectinfo-2024-02-19T010906-ns_1%40svc-s-node-009.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/logs-after-8hours-idle-system/collectinfo-2024-02-19T010906-ns_1%40svc-s-node-010.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/logs-after-8hours-idle-system/collectinfo-2024-02-19T010906-ns_1%40svc-s-node-011.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip