TF-IDF takes into account term frequency and inverse document frequency to determine term scoring. So, it rewards term frequency and penalizes document frequency.
BM25 goes beyond this - https://en.wikipedia.org/wiki/Okapi_BM25
- Weigh pros & cons of tf-idf vs bm25 for couchbase
- BM25 score to work better alongside kNN distance (as this is a global score as opposed to tf-idf)
- Will be an index creation time setting, user to be able to choose between tf-idf and bm-25
- This will entail a file format change to accommodate the new score metrics.