Description
The fts service exited with error code 137 while running the queries.
Service 'fts' exited with status 137. Restarting. Messages: 2024-05-16T16:53:12.956-07:00 [INFO] app_herder: querying over queryQuota: 13804339200, estimated size: 48, runningQueryUsed: 9000, memUsed: 25646663600 2024-05-16T16:53:12.956-07:00 [INFO] app_herder: querying over queryQuota: 13804339200, estimated size: 56, runningQueryUsed: 9000, memUsed: 25646663608 2024-05-16T16:53:12.971-07:00 [INFO] app_herder: querying over queryQuota: 13804339200, estimated size: 88, runningQueryUsed: 9000, memUsed: 25646663640 2024-05-16T16:53:12.972-07:00 [INFO] app_herder: querying over queryQuota: 13804339200, estimated size: 24, runningQueryUsed: 9000, memUsed: 25646663576 2024-05-16T16:53:12.972-07:00 [INFO] app_herder: querying over queryQuota: 13804339200, estimated size: 16, runningQueryUsed: 9000, memUsed: 25646663568 2024-05-16T16:53:12.973-07:00 [INFO] app_herder: querying over queryQuota: 13804339200, estimated size: 9000, runningQueryUsed: 9000, memUsed: 25646672552 2024-05-16T16:53:12.973-07:00 [INFO] app_herder: indexing over indexQuota: 11216025600, memUsed: 26107934072, preIndexingMemory: 461270520, indexes: 212, waiting: 123 |
Steps followed to run the System test:
- Created an on-prem cluster with 12 nodes. Out of which 5 are fts nodes.
- Each node has atleast 12gb of memory.
- Created one scope and one collection under that scope
- Loaded 5 million documents for normal vectors and 10k documents with xattrs using the sift dataset
- Created two indexes with 90 partitions acoss 5 fts nodes. One index indexes the normal vector data and the other index indexes the xattrs vectors.
- Run knn queries
- Mutated the documents and then again ran the knn query
- Rebalanced in an FTS node
- Perform mutations and then again run the knn queries
- Rebalanced out an FTS node
The above steps run in loop and the index count increases with the loop. The test ran for almost 8-10 hrs, then it gives OOM error. Logs are attached for the cluster.
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.105.122.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.106.176.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.106.30.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.96.198.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.96.230.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.96.245.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.97.100.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.97.108.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.97.109.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.97.66.zip
- https://cb-engineering.s3.amazonaws.com/system-test-koushal-cluster/collectinfo-2024-05-17T041448-ns_1%40172.23.97.67.zip