Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-20566

[FTS] Queries on fields analyzed with shingle token filter yield incorrect results

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 5.0.0
    • 5.0.0
    • fts
    • Untriaged
    • No

    Description

      Build : 4.7.0-990

      Testcase :
      ./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D+F,GROUP=ALL -t fts.stable_topology_fts.StableTopFTS.index_query_custom_mapping,items=1000,custom_map=True,num_custom_analyzers=1,cm_id=22,num_queries=100,compare_es=True,GROUP=P0

      Description:
      If a field is analyzed using a custom analyzer that has shingle token filter, query types match, prefix, wildcard and inclusion/exclusion do not return expected hits when compared to ES.

      Sample Query of Prefix query type:

      [2016-08-15 12:40:07,398] - [task:1112] INFO - ------------------------------------------------------------------ Query # 4 -----------------------------------------------------------------
      [2016-08-15 12:40:07,417] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"field": "dept", "prefix": "Pr"}, "size": 10000000} on node: 172.23.106.72:
      [2016-08-15 12:40:07,453] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32}
      [2016-08-15 12:40:07,454] - [task:1140] INFO - FTS hits for query: {"field": "dept", "prefix": "Pr"} is 99 (took 5.217532ms)
      [2016-08-15 12:40:07,457] - [task:1150] INFO - ES hits for query: {"prefix": {"dept": "Pr"}} on es_index is 0 (took 1ms)
      [2016-08-15 12:40:07,458] - [task:1155] ERROR - FAIL: FTS hits: 99, while ES hits: 0
      [2016-08-15 12:40:07,458] - [task:1170] ERROR - FAIL: Following 99 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000732', u'emp10000124', u'emp10000231', u'emp10000230', u'emp10000438', u'emp10000877', u'emp10000871', u'emp10000703', u'emp10000331', u'emp10000426', u'emp10000548', u'emp10000420', u'emp10000542', u'emp10000541', u'emp10000038', u'emp10000584', u'emp10000984', u'emp10000630', u'emp10000903', u'emp10000653', u'emp10000305', u'emp10000655', u'emp10000145', u'emp10000143', u'emp10000414', u'emp10000416', u'emp10000021', u'emp10000791', u'emp10000025', u'emp10000251', u'emp10000250', u'emp10000798', u'emp10000994', u'emp10000625', u'emp10000289', u'emp10000640', u'emp10000728', u'emp10000648', u'emp10000405', u'emp10000017', u'emp10000016', u'emp10000568', u'emp10000399', u'emp10000565', u'emp10000561', u'emp10000847', u'emp10000719', u'emp10000496', u'emp10000499', u'emp10000710']
      

      Sample document not returned by ES, but returned by FTS:

      { "salary": 143829.85, "name": "Treva Gerónimo", "dept": "Finance", "is_manager": false, "mutated": 0, "join_date": "1996-09-18T08:46:00", "languages_known": [ "Arabic", "Vietnamese", "Romanian" ], "emp_id": "10000732", "type": "emp", "email": "treva@mcdiabetes.com" }

      Sample Query of Wildcard query type:

      [2016-08-15 12:40:07,582] - [task:1112] INFO - ------------------------------------------------------------------ Query # 7 -----------------------------------------------------------------
      [2016-08-15 12:40:07,600] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"field": "name", "wildcard": "Richardson*"}, "size": 10000000} on node: 172.23.106.72:
      [2016-08-15 12:40:07,629] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32}
      [2016-08-15 12:40:07,629] - [task:1140] INFO - FTS hits for query: {"field": "name", "wildcard": "Richardson*"} is 20 (took 4.505438ms)
      [2016-08-15 12:40:07,634] - [task:1150] INFO - ES hits for query: {"wildcard": {"name": "Richardson*"}} on es_index is 11 (took 1ms)
      [2016-08-15 12:40:07,635] - [task:1155] ERROR - FAIL: FTS hits: 20, while ES hits: 11
      [2016-08-15 12:40:07,635] - [task:1170] ERROR - FAIL: Following 9 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000083', u'emp10000673', u'emp10000725', u'emp10000147', u'emp10000133', u'emp10000623', u'emp10000776', u'emp10000203', u'emp10000375']
      

      Sample document not returned by ES, but returned by FTS:
      {
      "salary": 77199.42,
      "name": "Balandria Campbell",
      "mutated": 0,
      "is_manager": true,
      "dept": "Engineering",
      "join_date": "1975-05-11T19:11:00",
      "manages":

      { "team_size": 5, "reports": [ "Solita Simón", "Kerry Baker III", "Basha Sr.", "Araceli Turner", "Treva Palmer" ] }

      ,
      "languages_known": [
      "Malay",
      "Dutch",
      "Africans"
      ],
      "emp_id": "10000083",
      "type": "emp",
      "email": "balandria@mcdiabetes.com"
      }

      Sample Query of Match query type:

      [2016-08-15 12:40:08,106] - [task:1112] INFO - ------------------------------------------------------------------ Query # 11 -----------------------------------------------------------------
      [2016-08-15 12:40:08,124] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"disjuncts": [{"field": "dept", "match": "Pre-sales"}, {"field": "dept", "match": "Finance"}, {"field": "dept", "match": "Support"}]}, "size": 10000000} on node: 172.23.106.72:
      [2016-08-15 12:40:08,153] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32}
      [2016-08-15 12:40:08,153] - [task:1140] INFO - FTS hits for query: {"disjuncts": [{"field": "dept", "match": "Pre-sales"}, {"field": "dept", "match": "Finance"}, {"field": "dept", "match": "Support"}]} is 9 (took 6.172896ms)
      [2016-08-15 12:40:08,157] - [task:1150] INFO - ES hits for query: {"bool": {"should": [{"match": {"dept": "Pre-sales"}}, {"match": {"dept": "Finance"}}, {"match": {"dept": "Support"}}]}} on es_index is 0 (took 1ms)
      [2016-08-15 12:40:08,157] - [task:1155] ERROR - FAIL: FTS hits: 9, while ES hits: 0
      [2016-08-15 12:40:08,157] - [task:1170] ERROR - FAIL: Following 9 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000894', u'emp10000981', u'emp10000679', u'emp10000141', u'emp10000526', u'emp10000022', u'emp10000374', u'emp10000027', u'emp10000606']
      

      Sample document not returned by ES, but returned by FTS:
      {
      "salary": 59826.13,
      "name": "Safiya Jones",
      "mutated": 0,
      "is_manager": true,
      "dept": "Finance",
      "join_date": "1983-05-20T08:50:00",
      "manages":

      { "team_size": 9, "reports": [ "Duvessa Lee", "Treva White", "Chatha Morgan", "Jerica King Jr.", "Caryssa Carter", "Mia Williams", "Callia Stewart", "Kerry Lewis", "Mia Moore" ] }

      ,
      "languages_known": [
      "Vietnamese",
      "Sinhalese",
      "Malay"
      ],
      "emp_id": "10000894",
      "type": "emp",
      "email": "safiya@mcdiabetes.com"
      }

      Sample Query of Inclusion query type:

      [2016-08-15 12:40:07,635] - [task:1112] INFO - ------------------------------------------------------------------ Query # 8 -----------------------------------------------------------------
      [2016-08-15 12:40:07,654] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "mutated:>4 +mutated:<=2444 salary:<147000.0 +dept:\"Finance\""}, "size": 10000000} on node: 172.23.106.72:
      [2016-08-15 12:40:07,789] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32}
      [2016-08-15 12:40:07,789] - [task:1140] INFO - FTS hits for query: {"query": "mutated:>4 +mutated:<=2444 salary:<147000.0 +dept:\"Finance\""} is 0 (took 110.217102ms)
      [2016-08-15 12:40:07,928] - [task:1150] INFO - ES hits for query: {"query_string": {"query": "mutated:>4 +mutated:<=2444 salary:<147000.0 +dept:\"Finance\""}} on es_index is 1000 (took 17ms)
      [2016-08-15 12:40:07,929] - [task:1155] ERROR - FAIL: FTS hits: 0, while ES hits: 1000
      [2016-08-15 12:40:07,930] - [task:1170] ERROR - FAIL: Following 1000 docs were not returned by FTS, but ES, printing 50: [u'emp10000538', u'emp10000539', u'emp10000536', u'emp10000537', u'emp10000534', u'emp10000535', u'emp10000532', u'emp10000533', u'emp10000530', u'emp10000531', u'emp10000125', u'emp10000436', u'emp10000435', u'emp10000126', u'emp10000433', u'emp10000120', u'emp10000431', u'emp10000430', u'emp10000129', u'emp10000128', u'emp10000439', u'emp10000438', u'emp10000509', u'emp10000508', u'emp10000192', u'emp10000471', u'emp10000721', u'emp10000417', u'emp10000222', u'emp10000223', u'emp10000549', u'emp10000221', u'emp10000226', u'emp10000227', u'emp10000224', u'emp10000225', u'emp10000543', u'emp10000542', u'emp10000228', u'emp10000229', u'emp10000547', u'emp10000546', u'emp10000545', u'emp10000544', u'emp10000983', u'emp10000982', u'emp10000981', u'emp10000980', u'emp10000987', u'emp10000986']
      

      Sample document not returned by FTS, but returned by ES:
      {
      "salary": 100356.34,
      "name": "Quella Green",
      "mutated": 0,
      "is_manager": true,
      "dept": "HR",
      "join_date": "2014-03-23T11:12:00",
      "manages":

      { "team_size": 6, "reports": [ "Trista Lee", "Casondrah Scott", "Ambika Lee", "Desdomna Campbell", "Hedda Moore", "Antonia Richardson IX" ] }

      ,
      "languages_known": [
      "Vietnamese",
      "English",
      "Portuguese"
      ],
      "emp_id": "10000538",
      "type": "emp",
      "email": "quella@mcdiabetes.com"
      }

      Sample Query of Exclusion query type:

      [2016-08-15 12:40:13,809] - [task:1112] INFO - ------------------------------------------------------------------ Query # 64 -----------------------------------------------------------------
      [2016-08-15 12:40:13,829] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "-dept:\"Sales\""}, "size": 10000000} on node: 172.23.106.72:
      [2016-08-15 12:40:13,931] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32}
      [2016-08-15 12:40:13,932] - [task:1140] INFO - FTS hits for query: {"query": "-dept:\"Sales\""} is 1000 (took 60.117285ms)
      [2016-08-15 12:40:13,936] - [task:1150] INFO - ES hits for query: {"query_string": {"query": "-dept:\"Sales\""}} on es_index is 0 (took 1ms)
      [2016-08-15 12:40:13,936] - [task:1155] ERROR - FAIL: FTS hits: 1000, while ES hits: 0
      [2016-08-15 12:40:13,937] - [task:1170] ERROR - FAIL: Following 1000 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000538', u'emp10000539', u'emp10000536', u'emp10000537', u'emp10000534', u'emp10000535', u'emp10000532', u'emp10000533', u'emp10000530', u'emp10000531', u'emp10000437', u'emp10000124', u'emp10000435', u'emp10000434', u'emp10000121', u'emp10000120', u'emp10000431', u'emp10000122', u'emp10000719', u'emp10000129', u'emp10000128', u'emp10000439', u'emp10000438', u'emp10000168', u'emp10000472', u'emp10000441', u'emp10000466', u'emp10000467', u'emp10000471', u'emp10000721', u'emp10000417', u'emp10000222', u'emp10000223', u'emp10000549', u'emp10000548', u'emp10000226', u'emp10000227', u'emp10000224', u'emp10000225', u'emp10000543', u'emp10000542', u'emp10000228', u'emp10000229', u'emp10000547', u'emp10000546', u'emp10000545', u'emp10000544', u'emp10000983', u'emp10000982', u'emp10000981']
      

      Sample document returned by FTS, but not returned by ES:
      {
      "salary": 100356.34,
      "name": "Quella Green",
      "mutated": 0,
      "is_manager": true,
      "dept": "HR",
      "join_date": "2014-03-23T11:12:00",
      "manages":

      { "team_size": 6, "reports": [ "Trista Lee", "Casondrah Scott", "Ambika Lee", "Desdomna Campbell", "Hedda Moore", "Antonia Richardson IX" ] }

      ,
      "languages_known": [
      "Vietnamese",
      "English",
      "Portuguese"
      ],
      "emp_id": "10000538",
      "type": "emp",
      "email": "quella@mcdiabetes.com"
      }

      Attaching the testrunner console output for this test. It contains the index definition and other queries that failed for this test.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty