Feature #3036: Window Spooling #14181

hawkfish · 2024-10-01T03:58:49Z

Add a Seek method for built on a new PrevScanIndex method to support random access to collections. The scans are linear for now, but that should be good enough for the current task.
Plumb through everything needed to accumulate WindowDataChunk tuples in a ColumnDataCollection.
Convert the value storage for the "value" window functions to use collections. Note that the validity masks for IGNORE NULLS and EXCLUDE are still memory resident, but they are a lot smaller.
Create an explicit class for WindowAggregatorLocalState that can handle building and reading the argument collection in parallel instead of using a DataChunk and locks.
Convert the naïve aggregator to use collections.
Convert the segment tree aggregator to use collections.
Move all of the WindowDataChunk collection scanning functionality into a single class.
Convert WindowDistinctAggregator to use paging collections.
Create a wrapper for multi-threaded appending to WindowDataChunk objects.
Move RANGE values to a collection.
Convert the custom window functions to use collections instead of in-memory DataChunks.
Track the insert data validity and pass it down to avoid checks.
Pass down context so we can get real memory flushing sizes.

Add a Seek method for built on a new PrevScanIndex method to support random access to collections. The scans are linear for now, but that should be good enough for the current task.

Plumb through everything needed to accumulate WindowDataChunk tuples in a ColumnDataCollection. Not used yet.

Convert the value storage for the "value" window functions to use collections. Note that the validity masks for IGNORE NULLS and EXCLUDE are still memory resident, but they are a lot smaller.

Create an explicit class for this that can handle building and reading the argument collection in parallel instead of using a DataChunk and locks.

Convert the naïve aggregator to use collections. Also fix loop invariant mistake in PrevScanIndex.

Convert the segment tree aggregator to use collections.

Move all of the WindowDataChunk collection scanning functionality into a single class.

Convert WindowDistinctAggregator to use paging collections.

Create a wrapper for multi-threaded appending to WindowDataChunk objects.

Move RANGE values to a collection.

Remove this helper as it relies on DataChunks.

Convert the custom window functions to use collections instead of in-memory DataChunks.

Track the insert data validity and pass it down to avoid checks. Also pass down context so we can get real memory flushing sizes.

lnkuiper

Code changes look good, this is awesome! So mad, mode and quantile can now be computed on partitions if that single partition is larger than memory? Could you add a test for this? What if we have an evil query with multiple like this:

SELECT mad(c0) OVER (<window with one large partition that doesn't fit in memory>),
       mode(c0) OVER (<same window>),
       quantile(c0, 0.5) OVER (<same window>)
FROM tbl;

Could you add a test for that?

src/common/types/column/column_data_collection.cpp

Add tests to validate that the custom aggregators work with spooling. Also fix PR feedback and add missing PrepareMergeStage call.

lnkuiper

Thanks for the changes! The test looks great :)

Mytherin · 2024-10-07T13:09:05Z

Thanks!

Richard Wesley added 18 commits September 11, 2024 11:26

Feature duckdb#3036: ColumnDataCollection Seek

aa9d561

Add a Seek method for built on a new PrevScanIndex method to support random access to collections. The scans are linear for now, but that should be good enough for the current task.

Feature duckdb#3036: WindowDataChunk With ColumnDataCollection

3900639

Plumb through everything needed to accumulate WindowDataChunk tuples in a ColumnDataCollection. Not used yet.

Feature duckdb#3036: WindowValueExecutor Spooling

0ec2868

Convert the value storage for the "value" window functions to use collections. Note that the validity masks for IGNORE NULLS and EXCLUDE are still memory resident, but they are a lot smaller.

Feature duckdb#3036: WindowAggregatorLocalState

8241e66

Create an explicit class for this that can handle building and reading the argument collection in parallel instead of using a DataChunk and locks.

Feature duckdb#3036: WindowNaiveAggregator

b3a6d28

Convert the naïve aggregator to use collections. Also fix loop invariant mistake in PrevScanIndex.

Feature duckdb#3036: WindowSegmentTree Collections

7f71324

Convert the segment tree aggregator to use collections.

Merge branch 'feature' into window-spooling

0b34d01

Feature duckdb#3036: WindowTable Consolidation

8853642

Move all of the WindowDataChunk collection scanning functionality into a single class.

Feature duckdb#3036: WindowDistinctAggregator Collections

50635b0

Convert WindowDistinctAggregator to use paging collections.

Feature duckdb#3036: WindowBuilder Appending Wrapper

ee6b2fc

Create a wrapper for multi-threaded appending to WindowDataChunk objects.

Merge branch 'feature' into window-spooling

853b509

Feature duckdb#3036: Window Range Spooling

abe8def

Move RANGE values to a collection.

Merge branch 'feature' into window-spooling

4549738

Feature duckdb#3036: Custom Unary Window

c580ca8

Remove this helper as it relies on DataChunks.

Feature duckdb#3036: Custom Window Collections

16f634c

Convert the custom window functions to use collections instead of in-memory DataChunks.

Merge branch 'feature' into window-spooling

ae955b6

Feature duckdb#3036: Custom Window Validity

12ed820

Track the insert data validity and pass it down to avoid checks. Also pass down context so we can get real memory flushing sizes.

Merge branch 'feature' into window-spooling

c99e546

hawkfish requested review from lnkuiper and Mytherin October 1, 2024 03:58

hawkfish added the Ready For Review label Oct 1, 2024

lnkuiper suggested changes Oct 2, 2024

View reviewed changes

src/common/types/column/column_data_collection.cpp Outdated Show resolved Hide resolved

hawkfish added 2 commits October 3, 2024 09:43

Merge branch 'feature' into window-spooling

636d374

Feature duckdb#3036: Test Custom Spooling

077efef

Add tests to validate that the custom aggregators work with spooling. Also fix PR feedback and add missing PrepareMergeStage call.

duckdb-draftbot marked this pull request as draft October 3, 2024 22:26

hawkfish requested a review from lnkuiper October 3, 2024 22:30

hawkfish marked this pull request as ready for review October 3, 2024 22:31

lnkuiper approved these changes Oct 7, 2024

View reviewed changes

Mytherin merged commit 299c7f2 into duckdb:feature Oct 7, 2024
42 checks passed

hawkfish deleted the window-spooling branch October 7, 2024 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature #3036: Window Spooling #14181

Feature #3036: Window Spooling #14181

Uh oh!

hawkfish commented Oct 1, 2024

Uh oh!

lnkuiper left a comment

Uh oh!

Uh oh!

lnkuiper left a comment

Uh oh!

Uh oh!

Mytherin commented Oct 7, 2024

Uh oh!

Uh oh!

Feature #3036: Window Spooling #14181

Feature #3036: Window Spooling #14181

Uh oh!

Conversation

hawkfish commented Oct 1, 2024

Uh oh!

lnkuiper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lnkuiper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mytherin commented Oct 7, 2024

Uh oh!

Uh oh!