Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-20467

[FTS] Issues when using edge_ngram token_filter

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • 6.5.0
    • 5.0.0
    • fts
    • Untriaged
    • No

    Description

      Build : 4.7.0-971

      Test : (not yet merged)
      ./testrunner -i resources/jenkins/fts.ini -t fts.stable_topology_fts.StableTopFTS.index_query_custom_mapping,items=1000,custom_map=True,num_custom_analyzers=1,multiple_filters=true,cm_id=0,num_queries=100,compare_es=True,GROUP=P0,cluster=D+F

      Description : For the above test, there are 3 queries that return different hits in case of FTS vs ES in case the custom analyzer uses an edge_ngram token filter with back=true,min=3 and max=5 params. There are two cases here –

      1) 2 queries using match_phrase. It seems that FTS does not correctly use edge_ngram with match_phrase type of queries. This looks like a bug to me in FTS.
      2) 1 query which has should, must_not and must Boolean queries. I think FTS is correct here in not hitting a few documents.

      Case 1 : Match Phrase with back_edge_ngram (min 3, max 5)

      Query :

      {"field": "name", "match_phrase": "Sidonia Gerónimo"}

      Doc not returned by FTS, but returned by ES :

      {
        "salary": 60402.95,
        "name": "Antonia Gerónimo",
        "dept": "Info-tech",
        "is_manager": false,
        "mutated": 0,
        "join_date": "1971-02-11T06:57:00",
        "languages_known": [
          "Hindi",
          "Italian",
          "Spanish"
        ],
        "emp_id": "10000318",
        "type": "emp",
        "email": "antonia@mcdiabetes.com"
      }
      

      Case 2: Use of should, must_not and must Boolean queries.

      Query :
      {
      "should":{"disjuncts": [

      {"field": "name", "match": "Delores Johnson"}

      ,

      {"field": "name", "match": "Riona Adams"}

      ]},
      "must_not": {"disjuncts": [

      {"field": "name", "match": "Riona"}

      ,

      {"field": "name", "match": "Juan José"}

      ]},
      "must": {"conjuncts": [

      {"field": "manages.reports", "match": "Antonia Mia"}

      ]}
      }

      Sample doc not returned by FTS, but returned by ES :

      {
        "salary": 91984.77,
        "name": "Keelia Carter",
        "mutated": 0,
        "is_manager": true,
        "dept": "Support",
        "join_date": "1952-01-24T10:20:00",
        "manages": {
          "team_size": 6,
          "reports": [
            "Ciara Ann",
            "Kala Moore",
            "Severin Hall",
            "Deandra Aarón",
            "Mia Parker X",
            "Hedda Jr."
          ]
        },
        "languages_known": [
          "Arabic",
          "Vietnamese",
          "Portuguese"
        ],
        "emp_id": "10000120",
        "type": "emp",
        "email": "keelia@mcdiabetes.com"
      }
      

      Attaching the testrunner console output which has the index definition as well the query results.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              girish.benakappa Girish Benakappa
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty