Details
-
Technical task
-
Resolution: Unresolved
-
Major
-
Cypher
-
None
-
0
Description
Currently the number of centroids are determined based on the items_count in keyspace. This may not be accurate for a partial index if a lot of documents do not qualify based on the where clause predicate.
During sampling, indexer can calculate the percentage of qualifying documents as it needs to evaluate the where clause predicate and determine the number of centroids based on that.
Attachments
Issue Links
- relates to
-
MB-63187 [Sift 1B Dataset]: Indexer::initiateTraining error - Error: 'nx >= k' failed: Number of training points (6627) should be at least as large as number of clusters (15263)
- Open