-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Feature #3036: Window Spooling #14181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hawkfish
commented
Oct 1, 2024
- Add a Seek method for built on a new PrevScanIndex method to support random access to collections. The scans are linear for now, but that should be good enough for the current task.
- Plumb through everything needed to accumulate WindowDataChunk tuples in a ColumnDataCollection.
- Convert the value storage for the "value" window functions to use collections. Note that the validity masks for IGNORE NULLS and EXCLUDE are still memory resident, but they are a lot smaller.
- Create an explicit class for WindowAggregatorLocalState that can handle building and reading the argument collection in parallel instead of using a DataChunk and locks.
- Convert the naïve aggregator to use collections.
- Convert the segment tree aggregator to use collections.
- Move all of the WindowDataChunk collection scanning functionality into a single class.
- Convert WindowDistinctAggregator to use paging collections.
- Create a wrapper for multi-threaded appending to WindowDataChunk objects.
- Move RANGE values to a collection.
- Convert the custom window functions to use collections instead of in-memory DataChunks.
- Track the insert data validity and pass it down to avoid checks.
- Pass down context so we can get real memory flushing sizes.
Add a Seek method for built on a new PrevScanIndex method to support random access to collections. The scans are linear for now, but that should be good enough for the current task.
Plumb through everything needed to accumulate WindowDataChunk tuples in a ColumnDataCollection. Not used yet.
Convert the value storage for the "value" window functions to use collections. Note that the validity masks for IGNORE NULLS and EXCLUDE are still memory resident, but they are a lot smaller.
Create an explicit class for this that can handle building and reading the argument collection in parallel instead of using a DataChunk and locks.
Convert the naïve aggregator to use collections. Also fix loop invariant mistake in PrevScanIndex.
Convert the segment tree aggregator to use collections.
Move all of the WindowDataChunk collection scanning functionality into a single class.
Convert WindowDistinctAggregator to use paging collections.
Create a wrapper for multi-threaded appending to WindowDataChunk objects.
Move RANGE values to a collection.
Remove this helper as it relies on DataChunks.
Convert the custom window functions to use collections instead of in-memory DataChunks.
Track the insert data validity and pass it down to avoid checks. Also pass down context so we can get real memory flushing sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good, this is awesome! So mad
, mode
and quantile
can now be computed on partitions if that single partition is larger than memory? Could you add a test for this? What if we have an evil query with multiple like this:
SELECT mad(c0) OVER (<window with one large partition that doesn't fit in memory>),
mode(c0) OVER (<same window>),
quantile(c0, 0.5) OVER (<same window>)
FROM tbl;
Could you add a test for that?
Add tests to validate that the custom aggregators work with spooling. Also fix PR feedback and add missing PrepareMergeStage call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! The test looks great :)
Thanks! |