Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.1.0
-
Untriaged
-
1
-
No
Description
With CBO, intersect scan can be chosen by cost. When compared to a single index scan,
the added cost is the cost of multiple (additional) index scans, the benefit is potential reduction in the number of documents that needs to be fetched (since the index keys can be thrown away when considering all index scans).
However, currently at execution time, intersect scan always perform "early termination", i.e. as soon as one index scan finishes, the other index scans are stopped and the result is based on whichever index scan(s) that has finished. This deviates from the costing, in particular, the number of documents that needs to be fetched may not benefit from reduction of index keys (if not all index scans finished), and thus in many cases the intersect scan runs much slower than what the cost indicates.
We need a mechanism to indicate that the intersect scan should not perform early termination, if the intersect scan is chosen by cost via CBO.
This issue is identified with join enumeration focus suite, specifically comparing query 199 with query 319 in RSTNLJoinsV2.
query 199:
select count(1) from R join S USE HASH (PROBE) on R.u256 = S.u256 join T USE NL on S.u1K = T.u1K where R.rand <= 1024 and S.rand <= 512 and T.rand <= 512 |
query 319:
select count(1) from R join S USE NL on R.u256 = S.u256 join T USE HASH (BUILD) on S.u1K = T.u1K where R.rand <= 1024 and S.rand <= 512 and T.rand <= 512 |
Query 319 uses intersect scan on S under the nested-loop join. There is no intersect scan in query 199. The cost of query 319 is lower but it runs longer due to the fact that intersect scan terminates early and thus the fetch still need to fetch a large number of documents.