Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
6.5.0
Description
Currently, during the sampling phase of the parallel sort, the whole dataset is scanned, sampled, and materialized. Then, after computing the splitting vector, the materialized dataset is read to start redistributing the tuples based on the splitting vector and continue the rest of the sort process.
An improvement to this is to avoid scanning and materializing the whole dataset during the sampling phase by reading and materializing only portion of the dataset enough to compute the splitting vector.