[FTS] Queries on fields analyzed with shingle token filter yield incorrect results
Description
Components
Affects versions
Fix versions
Labels
Environment
Link to Log File, atop/blg, CBCollectInfo, Core dump
Release Notes Description
Attachments
- 15 Sep 2016, 06:57 AM
- 16 Aug 2016, 06:10 PM
Activity

Mihir Kamdar September 26, 2016 at 6:59 AM
Closing as this issue is no more seen in Build 4.7.0-1142

MartyM September 21, 2016 at 8:58 PM
Bleve SHA bump has happened, subsequent builds should contain the fix.

MartyM September 19, 2016 at 6:55 PM
I have broken out this second issue identified into a separate issue: https://couchbasecloud.atlassian.net/browse/MB-20992#icft=MB-20992
Now this one can be marked resolved, once the bleve SHA can be bumped.

MartyM September 15, 2016 at 2:16 PM
The remaining issue is actually quite different, and my preference would be to break it out as a separate issue for documentation purposes.
The query is:
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] ------------------------------------------------------------------ Query # 57 -----------------------------------------------------------------
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "-languages_known:\"German\""}, "size": 10000000} on node: 127.0.0.1:9201
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] Status: {u'successful': 32, u'failed': 0, u'total': 32}
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query: {"query": "-languages_known:\"German\""} is 1000 (took 26.677828ms)
2016-09-15 09:05:03 | INFO | MainProcess | Cluster_Thread | [task.execute] ES hits for query: {"query_string": {"query": "-languages_known:\"German\""}} on es_index is 0 (took 1ms)
2016-09-15 09:05:03 | ERROR | MainProcess | Cluster_Thread | [task.execute] FAIL: FTS hits: 1000, while ES hits: 0
2016-09-15 09:05:03 | ERROR | MainProcess | Cluster_Thread | [task.execute] FAIL: Following 1000 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000538', u'emp10000539', u'emp10000536', u'emp10000537', u'emp10000534', u'emp10000535', u'emp10000532', u'emp10000533', u'emp10000530', u'emp10000531', u'emp10000125', u'emp10000436', u'emp10000127', u'emp10000126', u'emp10000121', u'emp10000120', u'emp10000431', u'emp10000122', u'emp10000129', u'emp10000128', u'emp10000439', u'emp10000438', u'emp10000472', u'emp10000509', u'emp10000508', u'emp10000620', u'emp10000471', u'emp10000222', u'emp10000223', u'emp10000549', u'emp10000221', u'emp10000226', u'emp10000227', u'emp10000224', u'emp10000225', u'emp10000543', u'emp10000542', u'emp10000228', u'emp10000229', u'emp10000547', u'emp10000546', u'emp10000545', u'emp10000544', u'emp10000983', u'emp10000982', u'emp10000981', u'emp10000980', u'emp10000987', u'emp10000986', u'emp10000985']
The issue here is that a search for a single term like 'Germany' when using this particular analyzer (shingle min length 2), produces 0 tokens. So, no documents have any indexed values at all for the "languages_known" field. At at search time, when analysis is done on the search term 'Germany' it also produces 0 terms, so what do we do? Currently, if the analysis produces 0 tokens, we convert it to MatchNoneQuery. The MatchNoneQuery is actually inside the MustNot of a boolean, and this we return everything.
Obviously our behavior differs slightly than ES in this case. There are 2 places it could be and I'm still tracking that down.

MartyM September 15, 2016 at 1:07 PM
I have pushed one fix to bleve here: https://github.com/blevesearch/bleve/commit/c5159251a90a0f2b3372ecb4a4604f8ac93d04db
However, another query is still failing.
Details
Details
Assignee

Reporter

Is this a Regression?
Triage
Priority
Instabug
PagerDuty
PagerDuty Incident
PagerDuty

Sentry
Linked Issues
Sentry
Zendesk Support
Linked Tickets
Zendesk Support

Build : 4.7.0-990
Testcase :
./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D+F,GROUP=ALL -t fts.stable_topology_fts.StableTopFTS.index_query_custom_mapping,items=1000,custom_map=True,num_custom_analyzers=1,cm_id=22,num_queries=100,compare_es=True,GROUP=P0
Description:
If a field is analyzed using a custom analyzer that has shingle token filter, query types match, prefix, wildcard and inclusion/exclusion do not return expected hits when compared to ES.
Sample Query of Prefix query type:
[2016-08-15 12:40:07,398] - [task:1112] INFO - ------------------------------------------------------------------ Query # 4 ----------------------------------------------------------------- [2016-08-15 12:40:07,417] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"field": "dept", "prefix": "Pr"}, "size": 10000000} on node: 172.23.106.72: [2016-08-15 12:40:07,453] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32} [2016-08-15 12:40:07,454] - [task:1140] INFO - FTS hits for query: {"field": "dept", "prefix": "Pr"} is 99 (took 5.217532ms) [2016-08-15 12:40:07,457] - [task:1150] INFO - ES hits for query: {"prefix": {"dept": "Pr"}} on es_index is 0 (took 1ms) [2016-08-15 12:40:07,458] - [task:1155] ERROR - FAIL: FTS hits: 99, while ES hits: 0 [2016-08-15 12:40:07,458] - [task:1170] ERROR - FAIL: Following 99 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000732', u'emp10000124', u'emp10000231', u'emp10000230', u'emp10000438', u'emp10000877', u'emp10000871', u'emp10000703', u'emp10000331', u'emp10000426', u'emp10000548', u'emp10000420', u'emp10000542', u'emp10000541', u'emp10000038', u'emp10000584', u'emp10000984', u'emp10000630', u'emp10000903', u'emp10000653', u'emp10000305', u'emp10000655', u'emp10000145', u'emp10000143', u'emp10000414', u'emp10000416', u'emp10000021', u'emp10000791', u'emp10000025', u'emp10000251', u'emp10000250', u'emp10000798', u'emp10000994', u'emp10000625', u'emp10000289', u'emp10000640', u'emp10000728', u'emp10000648', u'emp10000405', u'emp10000017', u'emp10000016', u'emp10000568', u'emp10000399', u'emp10000565', u'emp10000561', u'emp10000847', u'emp10000719', u'emp10000496', u'emp10000499', u'emp10000710']
Sample document not returned by ES, but returned by FTS:
{
"salary": 143829.85,
"name": "Treva Gerónimo",
"dept": "Finance",
"is_manager": false,
"mutated": 0,
"join_date": "1996-09-18T08:46:00",
"languages_known": [
"Arabic",
"Vietnamese",
"Romanian"
],
"emp_id": "10000732",
"type": "emp",
"email": "treva@mcdiabetes.com"
}
Sample Query of Wildcard query type:
[2016-08-15 12:40:07,582] - [task:1112] INFO - ------------------------------------------------------------------ Query # 7 ----------------------------------------------------------------- [2016-08-15 12:40:07,600] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"field": "name", "wildcard": "Richardson*"}, "size": 10000000} on node: 172.23.106.72: [2016-08-15 12:40:07,629] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32} [2016-08-15 12:40:07,629] - [task:1140] INFO - FTS hits for query: {"field": "name", "wildcard": "Richardson*"} is 20 (took 4.505438ms) [2016-08-15 12:40:07,634] - [task:1150] INFO - ES hits for query: {"wildcard": {"name": "Richardson*"}} on es_index is 11 (took 1ms) [2016-08-15 12:40:07,635] - [task:1155] ERROR - FAIL: FTS hits: 20, while ES hits: 11 [2016-08-15 12:40:07,635] - [task:1170] ERROR - FAIL: Following 9 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000083', u'emp10000673', u'emp10000725', u'emp10000147', u'emp10000133', u'emp10000623', u'emp10000776', u'emp10000203', u'emp10000375']
Sample document not returned by ES, but returned by FTS:
{
"salary": 77199.42,
"name": "Balandria Campbell",
"mutated": 0,
"is_manager": true,
"dept": "Engineering",
"join_date": "1975-05-11T19:11:00",
"manages": {
"team_size": 5,
"reports": [
"Solita Simón",
"Kerry Baker III",
"Basha Sr.",
"Araceli Turner",
"Treva Palmer"
]
},
"languages_known": [
"Malay",
"Dutch",
"Africans"
],
"emp_id": "10000083",
"type": "emp",
"email": "balandria@mcdiabetes.com"
}
Sample Query of Match query type:
[2016-08-15 12:40:08,106] - [task:1112] INFO - ------------------------------------------------------------------ Query # 11 ----------------------------------------------------------------- [2016-08-15 12:40:08,124] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"disjuncts": [{"field": "dept", "match": "Pre-sales"}, {"field": "dept", "match": "Finance"}, {"field": "dept", "match": "Support"}]}, "size": 10000000} on node: 172.23.106.72: [2016-08-15 12:40:08,153] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32} [2016-08-15 12:40:08,153] - [task:1140] INFO - FTS hits for query: {"disjuncts": [{"field": "dept", "match": "Pre-sales"}, {"field": "dept", "match": "Finance"}, {"field": "dept", "match": "Support"}]} is 9 (took 6.172896ms) [2016-08-15 12:40:08,157] - [task:1150] INFO - ES hits for query: {"bool": {"should": [{"match": {"dept": "Pre-sales"}}, {"match": {"dept": "Finance"}}, {"match": {"dept": "Support"}}]}} on es_index is 0 (took 1ms) [2016-08-15 12:40:08,157] - [task:1155] ERROR - FAIL: FTS hits: 9, while ES hits: 0 [2016-08-15 12:40:08,157] - [task:1170] ERROR - FAIL: Following 9 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000894', u'emp10000981', u'emp10000679', u'emp10000141', u'emp10000526', u'emp10000022', u'emp10000374', u'emp10000027', u'emp10000606']
Sample document not returned by ES, but returned by FTS:
{
"salary": 59826.13,
"name": "Safiya Jones",
"mutated": 0,
"is_manager": true,
"dept": "Finance",
"join_date": "1983-05-20T08:50:00",
"manages": {
"team_size": 9,
"reports": [
"Duvessa Lee",
"Treva White",
"Chatha Morgan",
"Jerica King Jr.",
"Caryssa Carter",
"Mia Williams",
"Callia Stewart",
"Kerry Lewis",
"Mia Moore"
]
},
"languages_known": [
"Vietnamese",
"Sinhalese",
"Malay"
],
"emp_id": "10000894",
"type": "emp",
"email": "safiya@mcdiabetes.com"
}
Sample Query of Inclusion query type:
[2016-08-15 12:40:07,635] - [task:1112] INFO - ------------------------------------------------------------------ Query # 8 ----------------------------------------------------------------- [2016-08-15 12:40:07,654] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "mutated:>4 +mutated:<=2444 salary:<147000.0 +dept:\"Finance\""}, "size": 10000000} on node: 172.23.106.72: [2016-08-15 12:40:07,789] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32} [2016-08-15 12:40:07,789] - [task:1140] INFO - FTS hits for query: {"query": "mutated:>4 +mutated:<=2444 salary:<147000.0 +dept:\"Finance\""} is 0 (took 110.217102ms) [2016-08-15 12:40:07,928] - [task:1150] INFO - ES hits for query: {"query_string": {"query": "mutated:>4 +mutated:<=2444 salary:<147000.0 +dept:\"Finance\""}} on es_index is 1000 (took 17ms) [2016-08-15 12:40:07,929] - [task:1155] ERROR - FAIL: FTS hits: 0, while ES hits: 1000 [2016-08-15 12:40:07,930] - [task:1170] ERROR - FAIL: Following 1000 docs were not returned by FTS, but ES, printing 50: [u'emp10000538', u'emp10000539', u'emp10000536', u'emp10000537', u'emp10000534', u'emp10000535', u'emp10000532', u'emp10000533', u'emp10000530', u'emp10000531', u'emp10000125', u'emp10000436', u'emp10000435', u'emp10000126', u'emp10000433', u'emp10000120', u'emp10000431', u'emp10000430', u'emp10000129', u'emp10000128', u'emp10000439', u'emp10000438', u'emp10000509', u'emp10000508', u'emp10000192', u'emp10000471', u'emp10000721', u'emp10000417', u'emp10000222', u'emp10000223', u'emp10000549', u'emp10000221', u'emp10000226', u'emp10000227', u'emp10000224', u'emp10000225', u'emp10000543', u'emp10000542', u'emp10000228', u'emp10000229', u'emp10000547', u'emp10000546', u'emp10000545', u'emp10000544', u'emp10000983', u'emp10000982', u'emp10000981', u'emp10000980', u'emp10000987', u'emp10000986']
Sample document not returned by FTS, but returned by ES:
{
"salary": 100356.34,
"name": "Quella Green",
"mutated": 0,
"is_manager": true,
"dept": "HR",
"join_date": "2014-03-23T11:12:00",
"manages": {
"team_size": 6,
"reports": [
"Trista Lee",
"Casondrah Scott",
"Ambika Lee",
"Desdomna Campbell",
"Hedda Moore",
"Antonia Richardson IX"
]
},
"languages_known": [
"Vietnamese",
"English",
"Portuguese"
],
"emp_id": "10000538",
"type": "emp",
"email": "quella@mcdiabetes.com"
}
Sample Query of Exclusion query type:
[2016-08-15 12:40:13,809] - [task:1112] INFO - ------------------------------------------------------------------ Query # 64 ----------------------------------------------------------------- [2016-08-15 12:40:13,829] - [fts_base:1173] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "-dept:\"Sales\""}, "size": 10000000} on node: 172.23.106.72: [2016-08-15 12:40:13,931] - [task:1116] INFO - Status: {u'successful': 32, u'failed': 0, u'total': 32} [2016-08-15 12:40:13,932] - [task:1140] INFO - FTS hits for query: {"query": "-dept:\"Sales\""} is 1000 (took 60.117285ms) [2016-08-15 12:40:13,936] - [task:1150] INFO - ES hits for query: {"query_string": {"query": "-dept:\"Sales\""}} on es_index is 0 (took 1ms) [2016-08-15 12:40:13,936] - [task:1155] ERROR - FAIL: FTS hits: 1000, while ES hits: 0 [2016-08-15 12:40:13,937] - [task:1170] ERROR - FAIL: Following 1000 doc(s) were not returned by ES,but FTS, printing 50: [u'emp10000538', u'emp10000539', u'emp10000536', u'emp10000537', u'emp10000534', u'emp10000535', u'emp10000532', u'emp10000533', u'emp10000530', u'emp10000531', u'emp10000437', u'emp10000124', u'emp10000435', u'emp10000434', u'emp10000121', u'emp10000120', u'emp10000431', u'emp10000122', u'emp10000719', u'emp10000129', u'emp10000128', u'emp10000439', u'emp10000438', u'emp10000168', u'emp10000472', u'emp10000441', u'emp10000466', u'emp10000467', u'emp10000471', u'emp10000721', u'emp10000417', u'emp10000222', u'emp10000223', u'emp10000549', u'emp10000548', u'emp10000226', u'emp10000227', u'emp10000224', u'emp10000225', u'emp10000543', u'emp10000542', u'emp10000228', u'emp10000229', u'emp10000547', u'emp10000546', u'emp10000545', u'emp10000544', u'emp10000983', u'emp10000982', u'emp10000981']
Sample document returned by FTS, but not returned by ES:
{
"salary": 100356.34,
"name": "Quella Green",
"mutated": 0,
"is_manager": true,
"dept": "HR",
"join_date": "2014-03-23T11:12:00",
"manages": {
"team_size": 6,
"reports": [
"Trista Lee",
"Casondrah Scott",
"Ambika Lee",
"Desdomna Campbell",
"Hedda Moore",
"Antonia Richardson IX"
]
},
"languages_known": [
"Vietnamese",
"English",
"Portuguese"
],
"emp_id": "10000538",
"type": "emp",
"email": "quella@mcdiabetes.com"
}
Attaching the testrunner console output for this test. It contains the index definition and other queries that failed for this test.