Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-18219

[FTS] Highlighting of results and snippet selection is wrong

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 4.5.0
    • 4.5.0
    • cbft
    • Security Level: Public
    • Version 4.5.0-1307 on Mac
    • Untriaged
    • No

    Description

      Search results with highlighting and snippets are incorrect for some documents. The searched for words are not in the snippet and the highlighting appears on other words.

      Documents that shows the wrong results highlighting for a search for "subway/rapid": 1217245.json and 2514575.json

      Documents that show wrong highlighting for a search for "stole"
      609868.json and 109479.json

      I suspected that they are not UTF-8 encoding but "file -I" informs me they are us-ascii which is theoretically a subset of UTF-8.

      Wills-MacBook-Pro:fts will$ file -I ~/datasets/json/1217245.json
      /Users/will/datasets/json/1217245.json: text/plain; charset=us-ascii
      Wills-MacBook-Pro:fts will$ file -I ~/datasets/json/2514575.json
      /Users/will/datasets/json/2514575.json: text/plain; charset=us-ascii
      Wills-MacBook-Pro:fts will$ file -I ~/datasets/json/609868.json 
      /Users/will/datasets/json/609868.json: text/plain; charset=us-ascii
      Wills-MacBook-Pro:fts will$ file -I ~/datasets/json/109479.json
      /Users/will/datasets/json/109479.json: text/plain; charset=us-ascii
      
      

      Index definition:

      {
        "type": "fulltext-index",
        "name": "reviews",
        "uuid": "7f584d644ac8a7c7",
        "sourceType": "couchbase",
        "sourceName": "reviews",
        "sourceUUID": "7b00f8b84c36de0ba10a44ff5ce5301b",
        "planParams": {
          "maxPartitionsPerPIndex": 32,
          "numReplicas": 0,
          "hierarchyRules": null,
          "nodePlanParams": null,
          "pindexWeights": null,
          "planFrozen": false
        },
        "params": {
          "mapping": {
            "analysis": {
              "analyzers": {},
              "char_filters": {},
              "token_filters": {},
              "token_maps": {},
              "tokenizers": {}
            },
            "byte_array_converter": "json",
            "default_analyzer": "standard",
            "default_datetime_parser": "dateTimeOptional",
            "default_field": "_all",
            "default_mapping": {
              "display_order": "0",
              "dynamic": true,
              "enabled": true,
              "fields": [],
              "properties": {
                "Reviews": {
                  "display_order": "0",
                  "dynamic": false,
                  "enabled": true,
                  "fields": [],
                  "properties": {
                    "Content": {
                      "dynamic": false,
                      "enabled": true,
                      "fields": [
                        {
                          "analyzer": "",
                          "date_format": null,
                          "display_order": "0",
                          "include_in_all": true,
                          "include_term_vectors": true,
                          "index": true,
                          "name": "Content",
                          "store": true,
                          "type": "text"
                        }
                      ],
                      "properties": {}
                    }
                  }
                }
              }
            },
            "default_type": "_default",
            "type_field": "type",
            "types": {}
          },
          "store": {
            "kvStoreName": "forestdb"
          }
        },
        "sourceParams": {
          "authPassword": "",
          "authSaslPassword": "",
          "authSaslUser": "",
          "authUser": "reviews",
          "clusterManagerBackoffFactor": 0,
          "clusterManagerSleepInitMS": 0,
          "clusterManagerSleepMaxMS": 2000,
          "dataManagerBackoffFactor": 0,
          "dataManagerSleepInitMS": 0,
          "dataManagerSleepMaxMS": 2000,
          "feedBufferAckThreshold": 0,
          "feedBufferSizeBytes": 0
        }
      }
      

      Attachments

        1. 109479.json
          4 kB
          Will Gardella
        2. 1217245.json
          10 kB
          Will Gardella
        3. 2514575.json
          10 kB
          Will Gardella
        4. 609868.json
          3 kB
          Will Gardella
        5. bad-highlight-dump.tar.gz
          67 kB
          Marty Schoch [X]
        6. nohighlight.png
          381 kB
          Marty Schoch [X]
        7. Screen Shot 2016-02-17 at 11.20.53 AM.png
          240 kB
          Will Gardella
        8. Screen Shot 2016-02-17 at 11.21.13 AM.png
          228 kB
          Will Gardella

        Issue Links

          For Gerrit Dashboard: MB-18219
          # Subject Branch Project Status CR V

          Activity

            People

              will.gardella Will Gardella (Inactive)
              will.gardella Will Gardella (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty