Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-33521

N1QL-FTS Integration phase 2: incorrect search phrase query handling.

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not a Bug
    • 6.5.0
    • None
    • fts, query
    • Untriaged
    • Centos 64-bit
    • Unknown

    Description

      Build 6.5.0 build 2715 

      Run string match phrase query:

      select count(*) from `travel-sample` where SEARCH(`travel-sample`, "'United Kingdom'")

      against standard `travel-sample` bucket produces 6801 results.

      Same amount of results is produced for string query that matched 2 words:

      select count(*) from `travel-sample` where SEARCH(`travel-sample`, "United Kingdom")

      gives 6801 results.

       

      Execution plan is:

      {
          "#operator": "Sequence",
          "~children": [
              {
                  "#operator": "IndexFtsSearch",
                  "covers": [
                      "cover (search(`travel-sample`, \"'United Kingdom'\"))",
                      "cover ((meta(`travel-sample`).`id`))",
                      "cover (search_score((`travel-sample`.`out`)))",
                      "cover (search_meta((`travel-sample`.`out`)))"
                  ],
                  "index": "idx_travel_sample_fts",
                  "index_id": "10770c11a52fde37",
                  "keyspace": "travel-sample",
                  "namespace": "default",
                  "search_info": {
                      "field": "\"\"",
                      "outname": "out",
                      "query": "\"'United Kingdom'\""
                  },
                  "using": "fts"
              },
              {
                  "#operator": "Parallel",
                  "~child": {
                      "#operator": "Sequence",
                      "~children": [
                          {
                              "#operator": "Filter",
                              "condition": "cover (search(`travel-sample`, \"'United Kingdom'\"))"
                          },
                          {
                              "#operator": "InitialGroup",
                              "aggregates": [
                                  "count(*)"
                              ],
                              "group_keys": []
                          }
                      ]
                  }
              },
              {
                  "#operator": "IntermediateGroup",
                  "aggregates": [
                      "count(*)"
                  ],
                  "group_keys": []
              },
              {
                  "#operator": "FinalGroup",
                  "aggregates": [
                      "count(*)"
                  ],
                  "group_keys": []
              },
              {
                  "#operator": "Parallel",
                  "~child": {
                      "#operator": "Sequence",
                      "~children": [
                          {
                              "#operator": "InitialProject",
                              "result_terms": [
                                  {
                                      "expr": "count(*)"
                                  }
                              ]
                          },
                          {
                              "#operator": "FinalProject"
                          }
                      ]
                  }
              }
          ]
      }
      

      FTS index definition is:

      {
       "name": "idx_travel_sample_fts",
       "type": "fulltext-index",
       "params": {
        "doc_config": {
         "docid_prefix_delim": "",
         "docid_regexp": "",
         "mode": "type_field",
         "type_field": "type"
        },
        "mapping": {
         "default_analyzer": "standard",
         "default_datetime_parser": "dateTimeOptional",
         "default_field": "_all",
         "default_mapping": {
          "dynamic": true,
          "enabled": true
         },
         "default_type": "_default",
         "docvalues_dynamic": true,
         "index_dynamic": true,
         "store_dynamic": false,
         "type_field": "_type"
        },
        "store": {
         "indexType": "scorch",
         "kvStoreName": "mossStore",
         "mossStoreOptions": {}
        }
       },
       "sourceType": "couchbase",
       "sourceName": "travel-sample",
       "sourceUUID": "2562b8e629d87aa97fc0f5271451d8f5",
       "sourceParams": null,
       "planParams": {
        "maxPartitionsPerPIndex": 171,
        "numReplicas": 0
       },
       "uuid": "10770c11a52fde37"
      }

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          steve Steve Yen added a comment -

          Hi Evgeny Makarenko – if I understand what you're trying to do here, I think this is a factor how quoting is handled (where clearer document is likely needed!) – e.g., what you're seeing is expected behavior.

          In the parser for the query-string format, only the double-quote (") characters are examined to demarcate the start and end of a phrase. (see: https://github.com/blevesearch/bleve/blob/master/search/query/query_string_lex.go#L109 )

          Other punctuation chars that are within the double-quotes are stripped away by the standard analyzer. So, strings like "United. KINGDOM!!!." and "United@Kingdom" and " 'United' 'Kingdom' " will be processed by the standard text analyzer to produce the same, two output tokens: "united", "kingdom".

          That is, the single-quote (') chars in your query-string search are treated like regular old punctuation chars (like how '.' and '!' are treated).

          steve Steve Yen added a comment - Hi Evgeny Makarenko – if I understand what you're trying to do here, I think this is a factor how quoting is handled (where clearer document is likely needed!) – e.g., what you're seeing is expected behavior. In the parser for the query-string format, only the double-quote (") characters are examined to demarcate the start and end of a phrase. (see: https://github.com/blevesearch/bleve/blob/master/search/query/query_string_lex.go#L109 ) Other punctuation chars that are within the double-quotes are stripped away by the standard analyzer. So, strings like "United. KINGDOM!!!." and "United@Kingdom" and " 'United' 'Kingdom' " will be processed by the standard text analyzer to produce the same, two output tokens: "united", "kingdom". That is, the single-quote (') chars in your query-string search are treated like regular old punctuation chars (like how '.' and '!' are treated).

          People

            keshav Keshav Murthy
            evgeny.makarenko Evgeny Makarenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty