Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Untriaged
-
0
-
No
Description
Test environment:
- 5 dedicated fts nodes with 16cpus and 64GiB RAM each node
- 60Million document on KV
- 60Million documents have vectors of dimension 1536
Test Steps:
- Loaded 60Mil documents on to KV
- Created indexing with 18 partitions per node, so overall 90 paritions
- I have only one index with l2_norm
- Indexing took 1hour 17mins for 60Million documents without any OOM but memory almost went up to 99% on few nodes( One observation is , initial 40Million took only 25mins or so next 20million to complete took
- Left the cluster idle for nearly 10+ hours to see usage comes down, but never came down below 37% or so. For whole 10 hours per node 22+GiB is in use.
- Now started running queries with K=1 and k=100 and then K=200 in sequential manner, after running 8 queries or so on almost on all the nodes I seen 95+% cpu usage along with one node crashed saying OOM(Node-009)
- Queries which are successful took varying amounts of time like 25sec to 60sec.
Below logs will will help understanding the behaviour using promtimer. Also attached images below which confirms the behaviour
Logs:
- https://cb-engineering.s3.amazonaws.com/after-queries-run-mem-usage-high-18partitions/collectinfo-2024-02-19T012736-ns_1%40svc-s-node-007.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/after-queries-run-mem-usage-high-18partitions/collectinfo-2024-02-19T012736-ns_1%40svc-s-node-008.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/after-queries-run-mem-usage-high-18partitions/collectinfo-2024-02-19T012736-ns_1%40svc-s-node-009.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/after-queries-run-mem-usage-high-18partitions/collectinfo-2024-02-19T012736-ns_1%40svc-s-node-010.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip
- https://cb-engineering.s3.amazonaws.com/after-queries-run-mem-usage-high-18partitions/collectinfo-2024-02-19T012736-ns_1%40svc-s-node-011.b0epkc7rtqc2dmsa.sandbox.nonprod-project-avengers.com.zip