Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39686

FTS:N1FTY: Prefix queries not sargable for search func in the queries with default index

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      Unknown

      Description

      Build: 7.0.0-2208

      • Load Emp dataset export_def_emp.json
      • Create default index with below:

        {
          "type": "fulltext-index",
          "name": "default_index",
          "uuid": "1c972f76b93a3ad8",
          "sourceType": "gocbcore",
          "sourceName": "default",
          "sourceUUID": "4904e97c103d6be84a05aedffaedae0a",
          "planParams": {
            "maxPartitionsPerPIndex": 171
          },
          "params": {
            "doc_config": {
              "docid_prefix_delim": "",
              "docid_regexp": "",
              "mode": "type_field",
              "type_field": "type"
            },
            "mapping": {
              "analysis": {},
              "default_analyzer": "standard",
              "default_datetime_parser": "dateTimeOptional",
              "default_field": "_all",
              "default_mapping": {
                "dynamic": true,
                "enabled": true
              },
              "default_type": "_default",
              "docvalues_dynamic": true,
              "index_dynamic": true,
              "store_dynamic": false,
              "type_field": "_type"
            },
            "store": {
              "indexType": "scorch",
              "mossStoreOptions": {}
            }
          },
          "sourceParams": {}
        }
        

      • FTS index, not sargable for below queries with prefix:

        ----------------------------------------------------- Query # 3 ----------------------------------------------------
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"indexName": "default_index", "size": 10000000, "from": 0, "explain": false, "query": {"prefix": "fa", "field": "email"}, "fields": [], "ctl": {"consistency": {"level": "", "vectors": {}}, "timeout": 60000}} on node: 172.23.121.66:
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [task.execute] Status: {'total': 6, 'failed': 0, 'successful': 6}
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query: {"prefix": "fa", "field": "email"} is 19 (took 1.903491ms)
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [es_base.search] ES query '{'query': {'prefix': {'email': 'fa'}}}' 
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [task.execute] ES hits for query: {"query": {"prefix": {"email": "fa"}}} on es_index is 19 (took 22ms)
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [task.execute] SUCCESS: Docs returned by FTS = docs returned by ES, doc_ids verified
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [task.execute] Running N1QL query: select meta().id from default where type='emp' and search(default, {"prefix": "fa", "field": "email"})
        2020-06-01 15:55:08 | INFO | MainProcess | Cluster_Thread | [rest_client.query_tool] query params : statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22fa%22%2C+%22field%22%3A+%22email%22%7D%29
        2020-06-01 15:55:08 | ERROR | MainProcess | Cluster_Thread | [rest_client._http_request] POST http://172.23.121.65:8093/query?statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22fa%22%2C+%22field%22%3A+%22email%22%7D%29 body:  headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 404 reason: unknown b'{\n"requestID": "abb892b4-59bc-41a0-b954-a8428495ac23",\n"errors": [{"code":4000,"msg":"No index available on keyspace default that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online."}],\n"status": "fatal",\n"metrics": {"elapsedTime": "34.227663ms","executionTime": "34.134476ms","resultCount": 0,"resultSize": 0,"errorCount": 1}\n}\n' auth: Administrator:password
         
         
         
        2020-06-01 16:01:19 | INFO | MainProcess | Cluster_Thread | [task.execute] ----------------------------------------------------- Query # 4 ----------------------------------------------------
        2020-06-01 16:01:19 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"indexName": "default_index", "size": 10000000, "from": 0, "explain": false, "query": {"prefix": "emp", "field": "type"}, "fields": [], "ctl": {"consistency": {"level": "", "vectors": {}}, "timeout": 60000}} on node: 172.23.121.66:
        2020-06-01 16:01:19 | INFO | MainProcess | Cluster_Thread | [task.execute] Status: {'total': 6, 'failed': 0, 'successful': 6}
        2020-06-01 16:01:19 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query: {"prefix": "emp", "field": "type"} is 1000 (took 3.320501ms)
        2020-06-01 16:01:19 | INFO | MainProcess | Cluster_Thread | [es_base.search] ES query '{'query': {'prefix': {'type': 'emp'}}}' 
        2020-06-01 16:01:20 | INFO | MainProcess | Cluster_Thread | [task.execute] ES hits for query: {"query": {"prefix": {"type": "emp"}}} on es_index is 1000 (took 107ms)
        2020-06-01 16:01:20 | INFO | MainProcess | Cluster_Thread | [task.execute] SUCCESS: Docs returned by FTS = docs returned by ES, doc_ids verified
        2020-06-01 16:01:20 | INFO | MainProcess | Cluster_Thread | [task.execute] Running N1QL query: select meta().id from default where type='emp' and search(default, {"prefix": "emp", "field": "type"})
        2020-06-01 16:01:20 | INFO | MainProcess | Cluster_Thread | [rest_client.query_tool] query params : statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22emp%22%2C+%22field%22%3A+%22type%22%7D%29
        2020-06-01 16:01:20 | ERROR | MainProcess | Cluster_Thread | [rest_client._http_request] POST http://172.23.121.65:8093/query?statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22emp%22%2C+%22field%22%3A+%22type%22%7D%29 body:  headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 404 reason: unknown b'{\n"requestID": "b1a5c1ce-07c6-4559-bf5e-5a973375a402",\n"errors": [{"code":4000,"msg":"No index available on keyspace default that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online."}],\n"status": "fatal",\n"metrics": {"elapsedTime": "1.100337ms","executionTime": "1.031255ms","resultCount": 0,"resultSize": 0,"errorCount": 1}\n}\n' auth: Administrator:password
         
         
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [task.execute] ----------------------------------------------------- Query # 6 ----------------------------------------------------
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"indexName": "default_index", "size": 10000000, "from": 0, "explain": false, "query": {"prefix": "a", "field": "email"}, "fields": [], "ctl": {"consistency": {"level": "", "vectors": {}}, "timeout": 60000}} on node: 172.23.121.66:
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [task.execute] Status: {'total': 6, 'failed': 0, 'successful': 6}
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query: {"prefix": "a", "field": "email"} is 116 (took 12.550056ms)
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [es_base.search] ES query '{'query': {'prefix': {'email': 'a'}}}' 
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [task.execute] ES hits for query: {"query": {"prefix": {"email": "a"}}} on es_index is 116 (took 15ms)
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [task.execute] SUCCESS: Docs returned by FTS = docs returned by ES, doc_ids verified
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [task.execute] Running N1QL query: select meta().id from default where type='emp' and search(default, {"prefix": "a", "field": "email"})
        2020-06-01 16:01:56 | INFO | MainProcess | Cluster_Thread | [rest_client.query_tool] query params : statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22a%22%2C+%22field%22%3A+%22email%22%7D%29
        2020-06-01 16:01:56 | ERROR | MainProcess | Cluster_Thread | [rest_client._http_request] POST http://172.23.121.65:8093/query?statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22a%22%2C+%22field%22%3A+%22email%22%7D%29 body:  headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 404 reason: unknown b'{\n"requestID": "40abcf73-d22f-46e2-b402-9fa88761b5ad",\n"errors": [{"code":4000,"msg":"No index available on keyspace default that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online."}],\n"status": "fatal",\n"metrics": {"elapsedTime": "1.234045ms","executionTime": "1.170431ms","resultCount": 0,"resultSize": 0,"errorCount": 1}\n}\n' auth: Administrator:password
         
         
         
        0-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [task.execute] ----------------------------------------------------- Query # 7 ----------------------------------------------------
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"indexName": "default_index", "size": 10000000, "from": 0, "explain": false, "query": {"prefix": "sid", "field": "email"}, "fields": [], "ctl": {"consistency": {"level": "", "vectors": {}}, "timeout": 60000}} on node: 172.23.121.66:
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [task.execute] Status: {'total': 6, 'failed': 0, 'successful': 6}
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query: {"prefix": "sid", "field": "email"} is 12 (took 1.439661ms)
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [es_base.search] ES query '{'query': {'prefix': {'email': 'sid'}}}' 
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [task.execute] ES hits for query: {"query": {"prefix": {"email": "sid"}}} on es_index is 12 (took 3ms)
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [task.execute] SUCCESS: Docs returned by FTS = docs returned by ES, doc_ids verified
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [task.execute] Running N1QL query: select meta().id from default where type='emp' and search(default, {"prefix": "sid", "field": "email"})
        2020-06-01 16:02:09 | INFO | MainProcess | Cluster_Thread | [rest_client.query_tool] query params : statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22sid%22%2C+%22field%22%3A+%22email%22%7D%29
        2020-06-01 16:02:09 | ERROR | MainProcess | Cluster_Thread | [rest_client._http_request] POST http://172.23.121.65:8093/query?statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22sid%22%2C+%22field%22%3A+%22email%22%7D%29 body:  headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 404 reason: unknown b'{\n"requestID": "6f9d63af-210f-450f-880d-3fda8dbb1a29",\n"errors": [{"code":4000,"msg":"No index available on keyspace default that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online."}],\n"status": "fatal",\n"metrics": {"elapsedTime": "1.146858ms","executionTime": "1.083205ms","resultCount": 0,"resultSize": 0,"errorCount": 1}\n}\n' auth: Administrator:password
         
         
         
        your expected index is online.
        2020-06-01 16:02:26 | INFO | MainProcess | Cluster_Thread | [task.execute] ----------------------------------------------------- Query # 8 ----------------------------------------------------
        2020-06-01 16:02:26 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"indexName": "default_index", "size": 10000000, "from": 0, "explain": false, "query": {"prefix": "em", "field": "type"}, "fields": [], "ctl": {"consistency": {"level": "", "vectors": {}}, "timeout": 60000}} on node: 172.23.121.65:
        2020-06-01 16:02:27 | INFO | MainProcess | Cluster_Thread | [task.execute] Status: {'total': 6, 'failed': 0, 'successful': 6}
        2020-06-01 16:02:27 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query: {"prefix": "em", "field": "type"} is 1000 (took 3.316901ms)
        2020-06-01 16:02:27 | INFO | MainProcess | Cluster_Thread | [es_base.search] ES query '{'query': {'prefix': {'type': 'em'}}}' 
        2020-06-01 16:02:27 | INFO | MainProcess | Cluster_Thread | [task.execute] ES hits for query: {"query": {"prefix": {"type": "em"}}} on es_index is 1000 (took 23ms)
        2020-06-01 16:02:27 | INFO | MainProcess | Cluster_Thread | [task.execute] SUCCESS: Docs returned by FTS = docs returned by ES, doc_ids verified
        2020-06-01 16:02:27 | INFO | MainProcess | Cluster_Thread | [task.execute] Running N1QL query: select meta().id from default where type='emp' and search(default, {"prefix": "em", "field": "type"})
        2020-06-01 16:02:27 | INFO | MainProcess | Cluster_Thread | [rest_client.query_tool] query params : statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22em%22%2C+%22field%22%3A+%22type%22%7D%29
        2020-06-01 16:02:27 | ERROR | MainProcess | Cluster_Thread | [rest_client._http_request] POST http://172.23.121.65:8093/query?statement=select+meta%28%29.id+from+default+where+type%3D%27emp%27+and+search%28default%2C+%7B%22prefix%22%3A+%22em%22%2C+%22field%22%3A+%22type%22%7D%29 body:  headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 404 reason: unknown b'{\n"requestID": "395e8580-ca06-4ebc-a962-0c704276ad14",\n"errors": [{"code":4000,"msg":"No index available on keyspace default that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online."}],\n"status": "fatal",\n"metrics": {"elapsedTime": "1.174311ms","executionTime": "1.079508ms","resultCount": 0,"resultSize": 0,"errorCount": 1}\n}\n' auth: Administrator:password
        

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            girish.benakappa Girish Benakappa added a comment -

            Below wildcard queries are also not sargable:

            select meta().id from default where type='emp' and search(default, {"field": "email", "wildcard": "br*"})
            select meta().id from default where type='emp' and search(default, {"field": "languages_known", "wildcard": "*"})
            select meta().id from default where type='emp' and search(default, {"field": "type", "wildcard": "em*"})
            select meta().id from default where type='emp' and search(default, {"field": "email", "wildcard": "tempest*"})
            select meta().id from default where type='emp' and search(default, {"field": "type", "wildcard": "?mp"})
            select meta().id from default where type='emp' and search(default, {"field": "type", "wildcard": "e?p"})
            

            Show
            girish.benakappa Girish Benakappa added a comment - Below wildcard queries are also not sargable: select meta().id from default where type='emp' and search(default, {"field": "email", "wildcard": "br*"}) select meta().id from default where type='emp' and search(default, {"field": "languages_known", "wildcard": "*"}) select meta().id from default where type='emp' and search(default, {"field": "type", "wildcard": "em*"}) select meta().id from default where type='emp' and search(default, {"field": "email", "wildcard": "tempest*"}) select meta().id from default where type='emp' and search(default, {"field": "type", "wildcard": "?mp"}) select meta().id from default where type='emp' and search(default, {"field": "type", "wildcard": "e?p"})
            Hide
            girish.benakappa Girish Benakappa added a comment -

            Same behavior has been seen with wildcard/prefix and range in the query and with typed index:

            select meta().id from default where type='emp' and search(default, {"field": "name", "wildcard": "?ells"})
            select meta().id from default where type='emp' and search(default, {"field": "dept", "wildcard": "*"})
            select meta().id from default where type='emp' and search(default, {"prefix": "da", "field": "email"})
            select meta().id from default where type='emp' and search(default, {"field": "email", "min": "kala@mcdiabetes.com", "max": "kerry-ann@mcdiabetes.com", "inclusive_min": true, "inclusive_max": true})
            select meta().id from default where type='emp' and search(default, {"conjuncts": [{"field": "email", "match": "casondrah@mcdiabetes.com"}, {"prefix": "ca", "field": "email"}, {"field": "join_date", "start": "2006-06-25T06:29:00", "end": "2020-05-31T08:53:16.189983", "inclusive_start": false, "inclusive_end": false}, {"field": "mutated", "min": 0, "max": 733, "inclusive_min": true, "inclusive_max": true}]})
            
            

            Show
            girish.benakappa Girish Benakappa added a comment - Same behavior has been seen with wildcard/prefix and range in the query and with typed index: select meta().id from default where type='emp' and search(default, {"field": "name", "wildcard": "?ells"}) select meta().id from default where type='emp' and search(default, {"field": "dept", "wildcard": "*"}) select meta().id from default where type='emp' and search(default, {"prefix": "da", "field": "email"}) select meta().id from default where type='emp' and search(default, {"field": "email", "min": "kala@mcdiabetes.com", "max": "kerry-ann@mcdiabetes.com", "inclusive_min": true, "inclusive_max": true}) select meta().id from default where type='emp' and search(default, {"conjuncts": [{"field": "email", "match": "casondrah@mcdiabetes.com"}, {"prefix": "ca", "field": "email"}, {"field": "join_date", "start": "2006-06-25T06:29:00", "end": "2020-05-31T08:53:16.189983", "inclusive_start": false, "inclusive_end": false}, {"field": "mutated", "min": 0, "max": 733, "inclusive_min": true, "inclusive_max": true}]})
            Hide
            abhinav Abhinav Dangeti added a comment -

            Hey Girish Benakappa, this is change in behavior as mandated by MB-39592. The "keyword" analyzer needs to be used for fields to run these queries:

            • TermQuery
            • PhraseQuery
            • MultiPhraseQuery
            • FuzzyQuery
            • PrefixQuery
            • RegexpQuery
            • WildcardQuery
            Show
            abhinav Abhinav Dangeti added a comment - Hey Girish Benakappa , this is change in behavior as mandated by MB-39592 . The "keyword" analyzer needs to be used for fields to run these queries: TermQuery PhraseQuery MultiPhraseQuery FuzzyQuery PrefixQuery RegexpQuery WildcardQuery
            Hide
            abhinav Abhinav Dangeti added a comment -

            Marking this as resolved with "won't fix".

            There's more details on MB-39592.

            Please reach out to me if you've further concerns.

            Show
            abhinav Abhinav Dangeti added a comment - Marking this as resolved with "won't fix". There's more details on MB-39592 . Please reach out to me if you've further concerns.
            Hide
            girish.benakappa Girish Benakappa added a comment - - edited

            Abhinav Dangeti

            So is it safe to assume that for all text based queries, now it needs to have keyword analyzer? Just trying to understand what(if there are) text based queries are supported with other analyzers too.

            Show
            girish.benakappa Girish Benakappa added a comment - - edited Abhinav Dangeti So is it safe to assume that for all text based queries, now it needs to have keyword analyzer? Just trying to understand what(if there are) text based queries are supported with other analyzers too.
            Hide
            abhinav Abhinav Dangeti added a comment -

            No, not all text based queries. Just the ones I've mentioned earlier. Match and MatchPhrase queries will continue to work well with any analyzer, because they have an analyzer setting which defaults to the one from index mapping if not specified.

            The behavior for numeric, datetime, boolean and geopoint queries will remain unaffected by this.

            Show
            abhinav Abhinav Dangeti added a comment - No, not all text based queries. Just the ones I've mentioned earlier. Match and MatchPhrase queries will continue to work well with any analyzer, because they have an analyzer setting which defaults to the one from index mapping if not specified. The behavior for numeric, datetime, boolean and geopoint queries will remain unaffected by this.
            Hide
            girish.benakappa Girish Benakappa added a comment - - edited

            Summary of the discussion with Abhinav Dangeti on the changes from this issue:

            • For these queries, indexed fields must use the keyword analyzer OR query should contain index_name or analyzer :

              TermQuery
              PhraseQuery
              MultiPhraseQuery
              FuzzyQuery
              PrefixQuery
              RegexpQuery
              WildcardQuery
              

            • For Match and match_phrase queries, index with any analyzer is good except for NEG cases (“-” or “must_not”).
            • In NEG cases for match and match_phrase queries if the index uses a non-standard analyzer, then query should contain index_name or analyzer (for non-covering queries such as SELECT * etc.). This will be addressed with MB-39887 for CC.
            Show
            girish.benakappa Girish Benakappa added a comment - - edited Summary of the discussion with Abhinav Dangeti on the changes from this issue: For these queries, indexed fields must use the keyword analyzer OR query should contain index_name or analyzer : TermQuery PhraseQuery MultiPhraseQuery FuzzyQuery PrefixQuery RegexpQuery WildcardQuery For Match and match_phrase queries, index with any analyzer is good except for NEG cases (“-” or “must_not”). In NEG cases for match and match_phrase queries if the index uses a non-standard analyzer, then query should contain index_name or analyzer (for non-covering queries such as SELECT * etc.). This will be addressed with MB-39887 for CC.
            Hide
            raju Raju Suravarjjala added a comment -

            Bulk closing invalid, duplicate and won't fix bugs

            Show
            raju Raju Suravarjjala added a comment - Bulk closing invalid, duplicate and won't fix bugs

              People

              Assignee:
              abhinav Abhinav Dangeti
              Reporter:
              girish.benakappa Girish Benakappa
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty