Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51015

[CBM] Rift SQLite index is using a sub-optimal query

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.0
    • 7.1.0
    • tools
    • Untriaged
    • 1
    • Yes

    Description

      What's the issue?
      We're currently using a sub-optimal query for our "full scan" query.

      sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
      QUERY PLAN
      |--SEARCH vbucket_0 USING INDEX idx_cid_key (collection_id=?)
      `--USE TEMP B-TREE FOR ORDER BY
      

      As we can see, we're using the index 'idx_cid_key' in this case. For a relatively small dataset, this can still have fairly significant consequences.

      Runtime

              Command being timed: "sqlite3 index_0.sqlite.0 select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;"
              User time (seconds): 2.67
              System time (seconds): 24.47
              Percent of CPU this job got: 99%
              Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.22
              Average shared text size (kbytes): 0
              Average unshared data size (kbytes): 0
              Average stack size (kbytes): 0
              Average total size (kbytes): 0
              Maximum resident set size (kbytes): 6716
              Average resident set size (kbytes): 0
              Major (requiring I/O) page faults: 0
              Minor (reclaiming a frame) page faults: 892
              Voluntary context switches: 120
              Involuntary context switches: 774
              Swaps: 0
              File system inputs: 0
              File system outputs: 0
              Socket messages sent: 0
              Socket messages received: 0
              Signals delivered: 0
              Page size (bytes): 4096
              Exit status: 0
      

      If we run 'analyze' on our table, we see that the query planner chooses a different query.

      Optimal

      sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
      QUERY PLAN
      |--SEARCH vbucket_0 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
      |--USE TEMP B-TREE FOR GROUP BY
      `--USE TEMP B-TREE FOR ORDER BY
      

      If we run the same query (with caches flushed) we see:

      Runtime

          Command being timed: "sqlite3 index_0.sqlite.0 select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;"
              User time (seconds): 2.62
              System time (seconds): 0.07
              Percent of CPU this job got: 98%
              Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.74
              Average shared text size (kbytes): 0
              Average unshared data size (kbytes): 0
              Average stack size (kbytes): 0
              Average total size (kbytes): 0
              Maximum resident set size (kbytes): 23412
              Average resident set size (kbytes): 0
              Major (requiring I/O) page faults: 26
              Minor (reclaiming a frame) page faults: 2995
              Voluntary context switches: 162
              Involuntary context switches: 101
              Swaps: 0
              File system inputs: 3792
              File system outputs: 172408
              Socket messages sent: 0
              Socket messages received: 0
              Signals delivered: 0
              Page size (bytes): 4096
              Exit status: 0
      {nofromat}
       
      Now, ideally we don't want to run '{{analyze}}' because it performs a full index scan by default; this could be costly, however, we can achieve the same behavior using the '{{not indexed}}' clause provided by SQLite.
       
      {noformat:title=Not Indexed}
      sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 not indexed where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
      QUERY PLAN
      |--SEARCH vbucket_0 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
      |--USE TEMP B-TREE FOR GROUP BY
      `--USE TEMP B-TREE FOR ORDER BY
      

      What's the fix?
      We should change our query to be not indexed which will enforce that we use the better query plan.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            james.lee James Lee
            james.lee James Lee
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty