Skip to content

Conversation

hawkfish
Copy link
Contributor

@hawkfish hawkfish commented Oct 1, 2024

  • Add a Seek method for built on a new PrevScanIndex method to support random access to collections. The scans are linear for now, but that should be good enough for the current task.
  • Plumb through everything needed to accumulate WindowDataChunk tuples in a ColumnDataCollection.
  • Convert the value storage for the "value" window functions to use collections. Note that the validity masks for IGNORE NULLS and EXCLUDE are still memory resident, but they are a lot smaller.
  • Create an explicit class for WindowAggregatorLocalState that can handle building and reading the argument collection in parallel instead of using a DataChunk and locks.
  • Convert the naïve aggregator to use collections.
  • Convert the segment tree aggregator to use collections.
  • Move all of the WindowDataChunk collection scanning functionality into a single class.
  • Convert WindowDistinctAggregator to use paging collections.
  • Create a wrapper for multi-threaded appending to WindowDataChunk objects.
  • Move RANGE values to a collection.
  • Convert the custom window functions to use collections instead of in-memory DataChunks.
  • Track the insert data validity and pass it down to avoid checks.
  • Pass down context so we can get real memory flushing sizes.

Richard Wesley added 18 commits September 11, 2024 11:26
Add a Seek method for built on a new PrevScanIndex method
to support random access to collections.
The scans are linear for now, but that should be good enough
for the current task.
Plumb through everything needed to accumulate WindowDataChunk tuples
in a ColumnDataCollection. Not used yet.
Convert the value storage for the "value" window functions
to use collections. Note that the validity masks for
IGNORE NULLS and EXCLUDE are still memory resident,
but they are a lot smaller.
Create an explicit class for this that can handle building and reading
the argument collection in parallel instead of using a DataChunk and locks.
Convert the naïve aggregator to use collections.
Also fix loop invariant mistake in PrevScanIndex.
Convert the segment tree aggregator to use collections.
Move all of the WindowDataChunk collection scanning functionality
into a single class.
Convert WindowDistinctAggregator to use paging collections.
Create a wrapper for multi-threaded appending to WindowDataChunk objects.
Move RANGE values to a collection.
Remove this helper as it relies on DataChunks.
Convert the custom window functions to use collections instead of in-memory DataChunks.
Track the insert data validity and pass it down to avoid checks.
Also pass down context so we can get real memory flushing sizes.
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good, this is awesome! So mad, mode and quantile can now be computed on partitions if that single partition is larger than memory? Could you add a test for this? What if we have an evil query with multiple like this:

SELECT mad(c0) OVER (<window with one large partition that doesn't fit in memory>),
       mode(c0) OVER (<same window>),
       quantile(c0, 0.5) OVER (<same window>)
FROM tbl;

Could you add a test for that?

Add tests to validate that the custom aggregators work with spooling.
Also fix PR feedback and add missing PrepareMergeStage call.
@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 3, 2024 22:26
@hawkfish hawkfish requested a review from lnkuiper October 3, 2024 22:30
@hawkfish hawkfish marked this pull request as ready for review October 3, 2024 22:31
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! The test looks great :)

@Mytherin Mytherin merged commit 299c7f2 into duckdb:feature Oct 7, 2024
42 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Oct 7, 2024

Thanks!

@hawkfish hawkfish deleted the window-spooling branch October 7, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants