Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: Morpheus
Affects Version/s: Morpheus
Component/s: analytics
Labels:
- gfn
- triaged

Story Points:
0

Description

Currently, we inject PROJECT operators greedily between operators to reduce the size of intermediate results; however, this might still not efficient enough. To illustrate, let's consider the following query:

SELECT MAX(r.temp)

FROM ExperDatasetRow e, e.readings r

Plan:

distribute result [$$48] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

-- DISTRIBUTE_RESULT  |UNPARTITIONED|

  exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

  -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|

    project ([$$48]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

    -- STREAM_PROJECT  |UNPARTITIONED|

      assign [$$48] <- [{\"$1\": $$50}] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

      -- ASSIGN  |UNPARTITIONED|

        aggregate [$$50] <- [agg-global-sql-max($$53)] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

        -- AGGREGATE  |UNPARTITIONED|

          exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

          -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|

            aggregate [$$53] <- [agg-local-sql-max($$46)] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

            -- AGGREGATE  |PARTITIONED|

              project ([$$46]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

              -- STREAM_PROJECT  |PARTITIONED|

                assign [$$46] <- [$$r.getField(\"temp\")] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                -- ASSIGN  |PARTITIONED|

                  project ([$$r]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                  -- STREAM_PROJECT  |PARTITIONED|

                    unnest $$r <- scan-collection($$51) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                    -- UNNEST  |PARTITIONED|

                      project ([$$51]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                      -- STREAM_PROJECT  |PARTITIONED|

                        assign [$$51] <- [$$e.getField(\"readings\")] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                        -- ASSIGN  |PARTITIONED|

                          project ([$$e]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                          -- STREAM_PROJECT  |PARTITIONED|

                            exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|

                              data-scan []<-[$$49, $$e] <- FullDataverse.ExperDatasetRow [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                              -- DATASOURCE_SCAN  |PARTITIONED|

                                exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|

                                  empty-tuple-source [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                                  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

We see that the UNNEST operator is followed by a PROJECT operator:

                  project ([$$r]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                  -- STREAM_PROJECT  |PARTITIONED|

                    unnest $$r <- scan-collection($$51) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]

                    -- UNNEST  |PARTITIONED|

Let's say the frame size is 32KB (the current default) and the size of the array "readings" is close to the frame size itself (i.e., a little bit less than 32KB). Then, UNNEST would flush output frames rapidly as the frame size can only accommodate the array + a one or two elements of the unnested elements. Now, the array itself is not needed at all as it is projected out by project ([$$r]) and only the unnested elements are needed after UNNEST. This is wasteful and is one of the main reasons that unnesting large arrays is too slow.

If the unnest itself does not output the array (i.e., only produces the unnested elements), those unnested elements will have the full 32KB for themselves – reducing the number of frames produced by the UNNEST operator.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Wail Alkowaileet

Reporter:: Wail Alkowaileet

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Jun/23 9:57 AM

Updated:: 11/Apr/24 3:48 PM

Gerrit Reviews

There are no open Gerrit changes

Embed projection into operators

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty