Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-27706

Text Search (Token) N1QL query doesn't use tokens field in Index

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.5.0
    • 5.5.0
    • query
    • Cluster: iris
      OS: CentOS 7
      CPU: Data: E5-2630 v2 (24 vCPU), Query & Index: E5-2680 v3 (48 vCPU) Memory: Data & Query: 64GB, Index: 512GB
      Disk: Samsung Pro 850

    Description

      Since 5.5.0-1705 the Token-based query doesn't utilize the Index field with tokenized values, resulting full scan and major performance degradation.

      Index def: 
      CREATE INDEX tokens_idx ON `bucket-1`(address.zip, DISTINCT ARRAY s FOR s IN TOKENS(address.street) END);
      Query: 
      SELECT first_name, last_name, address FROM `bucket-1` WHERE address.zip = $1 AND ANY s IN TOKENS(address.street) SATISFIES s = $2 END
      Query plan expected (on builds prior 5.5.0-1705):

      ...

      "spans": [
      {
      "exact": true,
      "range": [

      { "high": "\"71053\"", "inclusion": 3, "low": "\"71053\"" }

      ,

      { "high": "\"446010\"", "inclusion": 3, "low": "\"446010\"" }

      ]
      }
      ],

      ...

      Query plan since 5.5.0-1705:
      ...
      "spans": [
      {
      "exact": true,
      "range": [

      { "high": "\"71053\"", "inclusion": 3, "low": "\"71053\"" }

      ]
      }
      ],
      ...
      The second field (tokens) is missing from the plan

       

       

      Changes in 5.5.0-1705:

      goproj/src/github.com/couchbase/query changed from 22bec496c5f6b03ac6cdcd3f2274f50e43038096 to master [+] 6296b0d MB-26740 Allow index keys or variables inside index keys to use same identifier as keyspace alias

      https://github.com/couchbase/query/commit/6296b0dee66cb2f9a7ee7133061b03d6f9b199a5

       

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            This works fine on build: 1674.

            keshav Keshav Murthy added a comment - This works fine on build: 1674.
            bingjie.miao Bingjie Miao added a comment -

            Using build 1705 on centos6 I am getting the correct query plan with both index fields being used. I'm using this query:

            EXPLAIN SELECT first_name, last_name, address FROM `bucket-1` WHERE address.zip = "71053" AND ANY s IN TOKENS(address.street) SATISFIES s = "446010" END;

            Note that I substitute the actual values from the query plan, instead of using $1 and $2.

            Alex Gyryk can you paste here the exact command you used to see the incorrect query plan?

            bingjie.miao Bingjie Miao added a comment - Using build 1705 on centos6 I am getting the correct query plan with both index fields being used. I'm using this query: EXPLAIN SELECT first_name, last_name, address FROM `bucket-1` WHERE address.zip = "71053" AND ANY s IN TOKENS(address.street) SATISFIES s = "446010" END; Note that I substitute the actual values from the query plan, instead of using $1 and $2. Alex Gyryk can you paste here the exact command you used to see the incorrect query plan?

            From build 1719:

            {
              "plan": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "DistinctScan",
                    "scan": {
                      "#operator": "IndexScan3",
                      "index": "tokens_idx",
                      "index_id": "80d8a334bf57e3a7",
                      "index_projection": {
                        "primary_key": true
                      },
                      "keyspace": "bucket-1",
                      "namespace": "default",
                      "spans": [
                        {
                          "exact": true,
                          "range": [
                            {
                              "high": "$1",
                              "inclusion": 3,
                              "low": "$1"
                            }
                          ]
                        }
                      ],
                      "using": "gsi"
                    }
                  },
                  {
                    "#operator": "Fetch",
                    "keyspace": "bucket-1",
                    "namespace": "default"
                  },
                  {
                    "#operator": "Parallel",
                    "~child": {
                      "#operator": "Sequence",
                      "~children": [
                        {
                          "#operator": "Filter",
                          "condition": "((((`bucket-1`.`address`).`zip`) = $1) and any `s` in tokens(((`bucket-1`.`address`).`street`)) satisfies (`s` = $2) end)"
                        },
                        {
                          "#operator": "InitialProject",
                          "result_terms": [
                            {
                              "expr": "(`bucket-1`.`first_name`)"
                            },
                            {
                              "expr": "(`bucket-1`.`last_name`)"
                            },
                            {
                              "expr": "(`bucket-1`.`address`)"
                            }
                          ]
                        },
                        {
                          "#operator": "FinalProject"
                        }
                      ]
                    }
                  }
                ]
              },
              "text": "SELECT first_name, last_name, address\nFROM `bucket-1`\nWHERE address.zip = $1 AND ANY s IN TOKENS(address.street) SATISFIES s = $2 END;"
            }
            

            pavelpaulau Pavel Paulau (Inactive) added a comment - From build 1719: { "plan": { "#operator": "Sequence", "~children": [ { "#operator": "DistinctScan", "scan": { "#operator": "IndexScan3", "index": "tokens_idx", "index_id": "80d8a334bf57e3a7", "index_projection": { "primary_key": true }, "keyspace": "bucket-1", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "$1", "inclusion": 3, "low": "$1" } ] } ], "using": "gsi" } }, { "#operator": "Fetch", "keyspace": "bucket-1", "namespace": "default" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "((((`bucket-1`.`address`).`zip`) = $1) and any `s` in tokens(((`bucket-1`.`address`).`street`)) satisfies (`s` = $2) end)" }, { "#operator": "InitialProject", "result_terms": [ { "expr": "(`bucket-1`.`first_name`)" }, { "expr": "(`bucket-1`.`last_name`)" }, { "expr": "(`bucket-1`.`address`)" } ] }, { "#operator": "FinalProject" } ] } } ] }, "text": "SELECT first_name, last_name, address\nFROM `bucket-1`\nWHERE address.zip = $1 AND ANY s IN TOKENS(address.street) SATISFIES s = $2 END;" }
            pavelpaulau Pavel Paulau (Inactive) added a comment - - edited

            This is actually weird. I get the following plan when I hit EXPLAIN the first time:

            {
              "plan": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "DistinctScan",
                    "scan": {
                      "#operator": "IndexScan3",
                      "index": "tokens_idx",
                      "index_id": "43a17fefdbe15cb8",
                      "index_projection": {
                        "primary_key": true
                      },
                      "keyspace": "bucket-1",
                      "namespace": "default",
                      "spans": [
                        {
                          "exact": true,
                          "range": [
                            {
                              "high": "$1",
                              "inclusion": 3,
                              "low": "$1"
                            },
                            {
                              "high": "$2",
                              "inclusion": 3,
                              "low": "$2"
                            }
                          ]
                        }
                      ],
                      "using": "gsi"
                    }
                  },
                  {
                    "#operator": "Fetch",
                    "keyspace": "bucket-1",
                    "namespace": "default"
                  },
                  {
                    "#operator": "Parallel",
                    "~child": {
                      "#operator": "Sequence",
                      "~children": [
                        {
                          "#operator": "Filter",
                          "condition": "((((`bucket-1`.`address`).`zip`) = $1) and any `s` in tokens(((`bucket-1`.`address`).`street`)) satisfies (`s` = $2) end)"
                        },
                        {
                          "#operator": "InitialProject",
                          "result_terms": [
                            {
                              "expr": "(`bucket-1`.`first_name`)"
                            },
                            {
                              "expr": "(`bucket-1`.`last_name`)"
                            },
                            {
                              "expr": "(`bucket-1`.`address`)"
                            }
                          ]
                        },
                        {
                          "#operator": "FinalProject"
                        }
                      ]
                    }
                  }
                ]
              },
              "text": "SELECT first_name, last_name, address\nFROM `bucket-1`\nWHERE address.zip = $1 AND ANY s IN TOKENS(address.street) SATISFIES s = $2 END;"
            }
            

            But then it changes to what I reported in my previous comment.

            pavelpaulau Pavel Paulau (Inactive) added a comment - - edited This is actually weird. I get the following plan when I hit EXPLAIN the first time: { "plan": { "#operator": "Sequence", "~children": [ { "#operator": "DistinctScan", "scan": { "#operator": "IndexScan3", "index": "tokens_idx", "index_id": "43a17fefdbe15cb8", "index_projection": { "primary_key": true }, "keyspace": "bucket-1", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "$1", "inclusion": 3, "low": "$1" }, { "high": "$2", "inclusion": 3, "low": "$2" } ] } ], "using": "gsi" } }, { "#operator": "Fetch", "keyspace": "bucket-1", "namespace": "default" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "((((`bucket-1`.`address`).`zip`) = $1) and any `s` in tokens(((`bucket-1`.`address`).`street`)) satisfies (`s` = $2) end)" }, { "#operator": "InitialProject", "result_terms": [ { "expr": "(`bucket-1`.`first_name`)" }, { "expr": "(`bucket-1`.`last_name`)" }, { "expr": "(`bucket-1`.`address`)" } ] }, { "#operator": "FinalProject" } ] } } ] }, "text": "SELECT first_name, last_name, address\nFROM `bucket-1`\nWHERE address.zip = $1 AND ANY s IN TOKENS(address.street) SATISFIES s = $2 END;" } But then it changes to what I reported in my previous comment.
            bingjie.miao Bingjie Miao added a comment -

            OK I now see the issue on second execution as well. Will investigate.

            bingjie.miao Bingjie Miao added a comment - OK I now see the issue on second execution as well. Will investigate.
            bingjie.miao Bingjie Miao added a comment -

            Build couchbase-server-5.5.0-1757 contains query commit d72a02d389a46dae20afea0e207fb9e869b99cff with commit message:
            MB_27706 Token index fields may not be formalized properly when query executed multiple times
            https://github.com/couchbase/query/commit/d72a02d389a46dae20afea0e207fb9e869b99cff

            bingjie.miao Bingjie Miao added a comment - Build couchbase-server-5.5.0-1757 contains query commit d72a02d389a46dae20afea0e207fb9e869b99cff with commit message: MB_27706 Token index fields may not be formalized properly when query executed multiple times https://github.com/couchbase/query/commit/d72a02d389a46dae20afea0e207fb9e869b99cff
            bingjie.miao Bingjie Miao added a comment -

            I misspelled the defect number (using underscore instead of dash) so the build message did not get properly added to this defect. I've manually copied it over above. 

            bingjie.miao Bingjie Miao added a comment - I misspelled the defect number (using underscore instead of dash) so the build message did not get properly added to this defect. I've manually copied it over above. 

            Build couchbase-server-5.5.0-2094 contains query commit efd797b75fbfecc7b728cf50298b44f9efdf055c with commit message:
            MB-28533 Properly formalize keyspace reference inside Meta() functions
            https://github.com/couchbase/query/commit/efd797b75fbfecc7b728cf50298b44f9efdf055c

            build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.0-2094 contains query commit efd797b75fbfecc7b728cf50298b44f9efdf055c with commit message: MB-28533 Properly formalize keyspace reference inside Meta() functions https://github.com/couchbase/query/commit/efd797b75fbfecc7b728cf50298b44f9efdf055c

            People

              oleksandr.gyryk Alex Gyryk (Inactive)
              oleksandr.gyryk Alex Gyryk (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty