Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: feature-backlog
Affects Version/s: 6.5.0
Component/s: analytics
Labels:
- stats
- triaged

Description

Currently, during the sampling phase of the parallel sort, the whole dataset is scanned, sampled, and materialized. Then, after computing the splitting vector, the materialized dataset is read to start redistributing the tuples based on the splitting vector and continue the rest of the sort process.

An improvement to this is to avoid scanning and materializing the whole dataset during the sampling phase by reading and materializing only portion of the dataset enough to compute the splitting vector.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Ali Alsuliman

Reporter:: Ali Alsuliman

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Nov/18 12:32 PM

Updated:: 20/Sep/23 6:34 PM

Gerrit Reviews

There are no open Gerrit changes

[CX] Improve the sampling phase of the parallel sort

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty