Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 7.1.0
Component/s: tools
Labels:
- request-dev-verify

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Yes

Description

What's the issue?
We're currently using a sub-optimal query for our "full scan" query.

sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;

QUERY PLAN

|--SEARCH vbucket_0 USING INDEX idx_cid_key (collection_id=?)

`--USE TEMP B-TREE FOR ORDER BY

As we can see, we're using the index 'idx_cid_key' in this case. For a relatively small dataset, this can still have fairly significant consequences.

Runtime
Command being timed: "sqlite3 index_0.sqlite.0 select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;"
User time (seconds): 2.67
System time (seconds): 24.47
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.22
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6716
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 892
Voluntary context switches: 120
Involuntary context switches: 774
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

If we run 'analyze' on our table, we see that the query planner chooses a different query.

Optimal
sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
QUERY PLAN
\|--SEARCH vbucket_0 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
\|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

If we run the same query (with caches flushed) we see:

Runtime
Command being timed: "sqlite3 index_0.sqlite.0 select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;"
User time (seconds): 2.62
System time (seconds): 0.07
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.74
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 23412
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 26
Minor (reclaiming a frame) page faults: 2995
Voluntary context switches: 162
Involuntary context switches: 101
Swaps: 0
File system inputs: 3792
File system outputs: 172408
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
{nofromat}

Now, ideally we don't want to run '{{analyze}}' because it performs a full index scan by default; this could be costly, however, we can achieve the same behavior using the '{{not indexed}}' clause provided by SQLite.

{noformat:title=Not Indexed}
sqlite> explain query plan select max(seqno), collection_id, key, deleted, offset, size from vbucket_0 not indexed where seqno >= 0 and seqno <= 500000000 and collection_id in (0) group by collection_id,key order by seqno;
QUERY PLAN
\|--SEARCH vbucket_0 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
\|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

What's the fix?
We should change our query to be not indexed which will enforce that we use the better query plan.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-51015
#	Subject	Branch	Project	Status	CR	V
170904,3	MB-51015 Use a more efficient query for the Rift full scan	neo	backup	Status: MERGED	+2	+1
170942,3	MB-51015 Use a more efficient query for the Rift stat collection	neo	backup	Status: MERGED	+2	+1
170944,1	Merge branch 'neo' into master	master	backup	Status: MERGED	+2	+1

Activity

People

Assignee:: James Lee

Reporter:: James Lee

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Feb/22 10:17 AM

Updated:: 24/Feb/22 10:39 AM

Resolved:: 17/Feb/22 2:11 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 3 closed Gerrit changes

Hide There are 3 closed Gerrit changes

MB-51015 Use a more efficient query for the Rift full scan: Gerrit Review:

MB-51015 Use a more efficient query for the Rift stat collection: Gerrit Review:

Merge branch 'neo' into master: Gerrit Review:

[CBM] Rift SQLite index is using a sub-optimal query

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty