Issue #5023: Window Radix Partitions #5909

hawkfish · 2023-01-13T20:53:47Z

Switch to using Laurens' radix partitioned column data collections.
Avoid recreating the RowDataCollectionScanner by adding a Reset method.

Switch to using Laurens' radix partitioned column data collections.

Avoid recreating the RowDataCollectionScanner by adding a Reset method. Tidy up some bits of code.

Adapt to the incoming data sizes instead of guessing badly and falling back to 1024 buckets. This was causing some performance regressions in TPC-DS.

Make test deterministic.

hawkfish · 2023-01-20T04:05:54Z

Failures appear unrelated (no window operator in the plan).

Performance improvements at both ends of the scale. For TPC-DS Q47, Q57 with small (32K) partitions the results are:

For the Fannie Mae SF-4 benchmark the query time drops from 44s to 33s.

lnkuiper

Very nice performance improvements! I am happy to see that PartitionedColumnData is being put to use. The code looks great.

In general, I really like the new flow of the Window operator, although I don't understand how the sorting is parallelized, and how large the sorted runs are.

What I got from this so far is that during Sink the data is collected in PartitionedColumnData, and during Combine these are combined.

This is great because you can now start Finalize/GetData knowing exactly how much data you have, and how it is distributed across hash groups.

I wonder if it's better to always go for 1024 hash groups, and combine them if they're too small. This prevents re-materializing the already materialized data. However, this will slow down the appends slightly, so it's a trade-off.

However, this is very flexible. You can merge partitions until their combined size is greater than some threshold, regardless of radix bits. For example, if you merge based on radix bits, e.g., going from 4 to 3, then you will merge 0101 and 1101. However, if you always over-partition, and just merge partitions together how you see fit, you can merge 0101 with 0001, if that results in more balanced hash groups. Let's discuss this later! :)

src/execution/operator/aggregate/physical_window.cpp

* Use partition size computer * Use ColumnDataConsumer to release data faster * Acquiesce to clangd whinging.

lnkuiper

Thanks for the changes! I think this is ready to go

Richard Wesley added 6 commits December 2, 2022 10:57

Issue duckdb#5023: Radix Partition Upgrade

53b2e36

Switch to using Laurens' radix partitioned column data collections.

Issue duckdb#5023: Radix Partition Cleanup

1c83398

Avoid recreating the RowDataCollectionScanner by adding a Reset method. Tidy up some bits of code.

Merge branch 'feature' into window-partitions

c397f26

Merge branch 'master' into window-partitions

22c1b22

Merge branch 'master' into window-partitions

03047b8

Merge branch 'master' into window-partitions

78a4d1f

hawkfish requested a review from lnkuiper January 13, 2023 20:54

hawkfish added the enhancement label Jan 13, 2023

Richard Wesley added 5 commits January 15, 2023 13:38

Merge branch 'master' into window-partitions

3d1262f

Merge branch 'master' into window-partitions

efc2ab4

Issue duckdb#5023: Radix Partition Adaptation

9ba94dc

Adapt to the incoming data sizes instead of guessing badly and falling back to 1024 buckets. This was causing some performance regressions in TPC-DS.

Merge branch 'master' into window-partitions

211ccc0

Issue duckdb#5023: Radix Partition Adaptation

8a3dc30

Make test deterministic.

Merge branch 'master' into window-partitions

8de7c83

lnkuiper reviewed Jan 23, 2023

View reviewed changes

Richard Wesley added 2 commits January 23, 2023 06:39

Merge branch 'master' into window-partitions

c7b86e3

Issue duckdb#5023: Radix Partition Polish

ebea446

* Use partition size computer * Use ColumnDataConsumer to release data faster * Acquiesce to clangd whinging.

lnkuiper approved these changes Jan 24, 2023

View reviewed changes

Mytherin merged commit 914dd8b into duckdb:master Jan 24, 2023

hawkfish deleted the window-partitions branch March 3, 2023 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue #5023: Window Radix Partitions #5909

Issue #5023: Window Radix Partitions #5909

Uh oh!

hawkfish commented Jan 13, 2023

Uh oh!

hawkfish commented Jan 20, 2023

Uh oh!

lnkuiper left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lnkuiper left a comment

Uh oh!

Uh oh!

Issue #5023: Window Radix Partitions #5909

Issue #5023: Window Radix Partitions #5909

Uh oh!

Conversation

hawkfish commented Jan 13, 2023

Uh oh!

hawkfish commented Jan 20, 2023

Uh oh!

lnkuiper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lnkuiper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!