Description
Originally part of MB-20566, but broken out into separate issue, so one can be resolved and this one deferred.
To reproduce:
./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D+F,GROUP=ALL -t fts.stable_topology_fts.StableTopFTS.index_query_custom_mapping,items=1000,custom_map=True,num_custom_analyzers=1,compare_es=True,cm_id=56,num_queries=100,GROUP=P0 |
The remaining issue in this test is that:
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] ------------------------------------------------------------------ Query # 57 ----------------------------------------------------------------- |
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "-languages_known:\"German\""}, "size": 10000000} on node: 127.0.0.1:9201 |
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] Status: {u'successful': 32, u'failed': 0, u'total': 32} |
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query: {"query": "-languages_known:\"German\""} is 1000 (took 26.677828ms) |
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] ES hits for query: {"query_string": {"query": "-languages_known:\"German\""}} on es_index is 0 (took 1ms) |
2016-09-15 09:05:03 | ERROR | MainProcess | Cluster_Thread | [task.execute] FAIL: FTS hits: 1000, while ES hits: 0 |
2016-09-15 09:05:03 | ERROR | MainProcess | Cluster_Thread | [task.execute] FAIL: Following 1000 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000538', u'emp10000539', u'emp10000536', u'emp10000537', u'emp10000534', u'emp10000535', u'emp10000532', u'emp10000533', u'emp10000530', u'emp10000531', u'emp10000125', u'emp10000436', u'emp10000127', u'emp10000126', u'emp10000121', u'emp10000120', u'emp10000431', u'emp10000122', u'emp10000129', u'emp10000128', u'emp10000439', u'emp10000438', u'emp10000472', u'emp10000509', u'emp10000508', u'emp10000620', u'emp10000471', u'emp10000222', u'emp10000223', u'emp10000549', u'emp10000221', u'emp10000226', u'emp10000227', u'emp10000224', u'emp10000225', u'emp10000543', u'emp10000542', u'emp10000228', u'emp10000229', u'emp10000547', u'emp10000546', u'emp10000545', u'emp10000544', u'emp10000983', u'emp10000982', u'emp10000981', u'emp10000980', u'emp10000987', u'emp10000986', u'emp10000985'] |
So, FTS returns all docs, and ES returns none. The behavior has to do with what happens when the search text being analyzed results in 0 tokens. ES will drop this clause entirely when parsing the query string. FTS keeps the clause when building an equivalent BooleanQuery.
Why can't we just change the way the BooleanQuery behaves? Because then that would work different from ES. NOTE that ES does return ALL DOCS for this query:
{
|
"query": { |
"bool": { |
"must_not": { |
"match": { |
"message": { |
"query": "the", |
"analyzer": "english" |
}
|
}
|
}
|
}
|
}
|
}
|
Attachments
For Gerrit Dashboard: MB-20992 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
74183,2 | bump bleve SHA for MB-22946 MB-20992 MB-20515 | master | manifest | Status: MERGED | +2 | +1 |