Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61640

Rank exact/full hits higher than fuzzy/tokenized hits

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Critical
    • 7.6.3
    • 7.6.0, 7.2.0
    • fts
    • None
    • 0

    Description

      The search engine scores exact matches and fuzzy matches (those at an edit distance of whatever specified) alike - we should score these differently, the closer the hit higher the score.

      The same should be applied when tokenization is involved - the largest search token match should score higher.

      For example -

      • If we tokenize "abhi" with an edge_ngram of (2,10) - we achieve these tokens - ["ab", "abh", "abhi"].
      • Now, if I search for "abhi" - I'll want a document that has "abhi" as an indexed token to score higher than a document that just has "abh".
      • Two documents that both have "abhi" indexed would score the same, meaning doc1 with "abhi" and doc2 with "abhinav" can score the same per the ngram definition, because it's impossible to predict what else can come after the largest search token.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            abhinav Abhi Dangeti
            abhinav Abhi Dangeti
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty