Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
Untriaged
-
1
-
Yes
Description
What's the issue?
We're currently using a sub-optimal query for our "full scan" query.
sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
|
QUERY PLAN
|
|--SEARCH vbucket_0 USING INDEX idx_cid_key (collection_id=?)
|
`--USE TEMP B-TREE FOR ORDER BY
|
As we can see, we're using the index 'idx_cid_key' in this case. For a relatively small dataset, this can still have fairly significant consequences.
Runtime |
Command being timed: "sqlite3 index_0.sqlite.0 select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;"
|
User time (seconds): 2.67
|
System time (seconds): 24.47
|
Percent of CPU this job got: 99%
|
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.22
|
Average shared text size (kbytes): 0
|
Average unshared data size (kbytes): 0
|
Average stack size (kbytes): 0
|
Average total size (kbytes): 0
|
Maximum resident set size (kbytes): 6716
|
Average resident set size (kbytes): 0
|
Major (requiring I/O) page faults: 0
|
Minor (reclaiming a frame) page faults: 892
|
Voluntary context switches: 120
|
Involuntary context switches: 774
|
Swaps: 0
|
File system inputs: 0
|
File system outputs: 0
|
Socket messages sent: 0
|
Socket messages received: 0
|
Signals delivered: 0
|
Page size (bytes): 4096
|
Exit status: 0
|
If we run 'analyze' on our table, we see that the query planner chooses a different query.
Optimal |
sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
|
QUERY PLAN
|
|--SEARCH vbucket_0 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
|
|--USE TEMP B-TREE FOR GROUP BY
|
`--USE TEMP B-TREE FOR ORDER BY
|
If we run the same query (with caches flushed) we see:
Runtime |
Command being timed: "sqlite3 index_0.sqlite.0 select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;"
|
User time (seconds): 2.62
|
System time (seconds): 0.07
|
Percent of CPU this job got: 98%
|
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.74
|
Average shared text size (kbytes): 0
|
Average unshared data size (kbytes): 0
|
Average stack size (kbytes): 0
|
Average total size (kbytes): 0
|
Maximum resident set size (kbytes): 23412
|
Average resident set size (kbytes): 0
|
Major (requiring I/O) page faults: 26
|
Minor (reclaiming a frame) page faults: 2995
|
Voluntary context switches: 162
|
Involuntary context switches: 101
|
Swaps: 0
|
File system inputs: 3792
|
File system outputs: 172408
|
Socket messages sent: 0
|
Socket messages received: 0
|
Signals delivered: 0
|
Page size (bytes): 4096
|
Exit status: 0
|
{nofromat}
|
|
Now, ideally we don't want to run '{{analyze}}' because it performs a full index scan by default; this could be costly, however, we can achieve the same behavior using the '{{not indexed}}' clause provided by SQLite.
|
|
{noformat:title=Not Indexed}
|
sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 not indexed where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
|
QUERY PLAN
|
|--SEARCH vbucket_0 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
|
|--USE TEMP B-TREE FOR GROUP BY
|
`--USE TEMP B-TREE FOR ORDER BY
|
What's the fix?
We should change our query to be not indexed which will enforce that we use the better query plan.