Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58901

Introduce support for BM25 scoring of search results

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • feature-backlog
    • 7.0.0
    • fts
    • None
    • 0

    Description

      TF-IDF takes into account term frequency and inverse document frequency to determine term scoring. So, it rewards term frequency and penalizes document frequency.

      BM25 goes beyond this - https://en.wikipedia.org/wiki/Okapi_BM25

      Also a good read - https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

      From elastic:

      Todo:

      • Weigh pros & cons of tf-idf vs bm25 for couchbase
      • BM25 score to work better alongside kNN distance (as this is a global score as opposed to tf-idf)

      More details:

      • Will be an index creation time setting, user to be able to choose between tf-idf and bm-25
      • This will entail a file format change to accommodate the new score metrics.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              abhinav Abhi Dangeti
              abhinav Abhi Dangeti
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty