Description
Build
4.5.0-1740
Testcase
test_27 in http://qa.sc.couchbase.com/view/FTS/job/cen006-p0-fts-vset00-00-custom-map-rqg/
./testrunner -i INI_FILE.ini get-cbcollect-info=False,get-logs=False,stop-on-failure=False,cluster=D+F,GROUP=ALL -t fts.stable_topology_fts.StableTopFTS.index_query_custom_mapping,items=1000,custom_map=True,cm_id=6,num_queries=100,compare_es=True,dataset=wiki,GROUP=P0
[2016-03-09 14:42:52,900] - [fts_base:1085] INFO - Running query {"from": 0, "indexName": "custom_index", "fields": [], "explain": false, "ctl": {"timeout": 0, "consistency": {"vectors": {}, "level": ""}}, "query": {"disjuncts": [{"field": "title", "match": "George II of Great Britain"}, {"field": "title", "match": "AOC"}]}, "size": 10000000} on node: 172.23.106.72
|
[2016-03-09 14:42:52,927] - [task:1071] INFO - FTS hits for query: {"disjuncts": [{"field": "title", "match": "George II of Great Britain"}, {"field": "title", "match": "AOC"}]} is 2 (took 9.423861ms)
|
[2016-03-09 14:42:52,951] - [task:1081] INFO - ES hits for query: {"bool": {"should": [{"match": {"title": "George II of Great Britain"}}, {"match": {"title": "AOC"}}]}} on es_index is 53 (took 5ms)
|
[2016-03-09 14:42:52,951] - [task:1086] ERROR - FAIL: FTS hits: 2, while ES hits: 53
|
[2016-03-09 14:42:52,951] - [task:1101] ERROR - FAIL: Following 51 docs were not returned by FTS, but ES, printing 50: [u'wiki10000289', u'wiki10000288', u'wiki10000117', u'wiki10000507', u'wiki10000527', u'wiki10000285', u'wiki10000690', u'wiki10000287', u'wiki10000951', u'wiki10000661', u'wiki10000682', u'wiki10000666', u'wiki10000664', u'wiki10000705', u'wiki10000732', u'wiki10000725', u'wiki10000416', u'wiki10000695', u'wiki10000642', u'wiki10000318', u'wiki10000313', u'wiki10000899', u'wiki10000024', u'wiki10000218', u'wiki10000372', u'wiki10000276', u'wiki10000277', u'wiki10000007', u'wiki10000658', u'wiki10000659', u'wiki10000762', u'wiki10000654', u'wiki10000121', u'wiki10000650', u'wiki10000753', u'wiki10000392', u'wiki10000672', u'wiki10000673', u'wiki10000674', u'wiki10000675', u'wiki10000718', u'wiki10000758', u'wiki10000098', u'wiki10000903', u'wiki10000671', u'wiki10000355', u'wiki10000689', u'wiki10000685', u'wiki10000219', u'wiki10000290']
|
Marty had noted once that ES' default analyzer was 'standard' with stop word removal disabled but it turns out ES' standard analyzer does no stop word removal by default. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html says the stopwords list is empty by default.
Using Marty's ES rest endpoint for analysis:
curl -XGET '172.23.106.53:9200/_analyze?analyzer=standard' -d 'George II of Great Britain'
|
{
|
"tokens":[
|
{
|
"token":"george",
|
"start_offset":0,
|
"end_offset":6,
|
"type":"<ALPHANUM>",
|
"position":1
|
},
|
{
|
"token":"ii",
|
"start_offset":7,
|
"end_offset":9,
|
"type":"<ALPHANUM>",
|
"position":2
|
},
|
{
|
"token":"of",
|
"start_offset":10,
|
"end_offset":12,
|
"type":"<ALPHANUM>",
|
"position":3
|
},
|
{
|
"token":"great",
|
"start_offset":13,
|
"end_offset":18,
|
"type":"<ALPHANUM>",
|
"position":4
|
},
|
{
|
"token":"britain",
|
"start_offset":19,
|
"end_offset":26,
|
"type":"<ALPHANUM>",
|
"position":5
|
}
|
]
|
}
|
And I see analysis.blevesearch.com returning only 4 terms ('of' stop word is removed) -
Analyze
|
Text: George II of Great Britain
|
|
Position Term Start End
|
1 george 0 6
|
2 ii 7 9
|
4 great 13 18
|
5 britain 19 26
|