Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
Goldfish Private Preview
Description
Given that COPY TO always write to a shared storage (e.g., S3), then redistributing the data isn't necessary. Two or more compute partitions can write to the same partition's destination without any conflicts as each file is prefixed by the compute partition ID.
This proposed writing scheme is less sensitive to skewness and each compute partition would have roughly the same number of tuples to write (of course that depends on the source query).
Analytics currently fails to write to a non-empty partition destination. Overcoming this issue would make this scheme possible to incorporate.