Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
5.5.3, 6.0.1, 6.5.0
-
None
Description
Currently all of the tf-idf scoring is done on a per-pindex basis (each of these is an individual bleve index).
This is a well understood problem, even in other search engines such as ElasticSearch, where the partitioned nature causes inconsistent scoring between documents.
Generally though, this doesn't matter as each partition would have a large number of docs so the discrepancy between scores should be minimal.
ElasticSearch added a new type of query 'dfs_query_then_fetch', described in https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch, which makes the scoring more accurate between partitions at the cost of doing extra roundtrips between the partitions.
It would be good if Couchbase Server FTS offered a similar mechanism to trade off performance in cases where you require more accurate scores.