-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Issue #6272: Window Scaled Repartitioning #6366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There is no need to copy the old partitioned data when repartitioning. But if we have started to combine, then repartitioning will duplicate data, so don't resize then.
Thanks for the PR! This still does not seem to solve the problem completely, though. If I add a loop to the test: # name: test/sql/window/test_window_repartition.test_slow
# description: Window reparitioning at scale
# group: [window]
statement ok
PRAGMA enable_verification
statement ok
create table df as
select d, i v1
from
range(date '2017-01-01', date '2020-12-31', interval '1' day) t(d),
range(3000) i
;
loop i 0 100
query I
select count(*)
from (
select percent_rank() over (partition by d order by v1) as rank_v1
from df
);
----
4380000
endloop Then running it I eventually get an off-by-a small amount still.
This amount always seems to be less than the vector size, so perhaps something happening with a resize while a vector is already in-flight? |
Disabling all in-flight resizing seems to fix the issue, but this may be too heavy-handed of a solution: diff --git a/src/execution/operator/aggregate/physical_window.cpp b/src/execution/operator/aggregate/physical_window.cpp
index fbd7865f02..958c574efe 100644
--- a/src/execution/operator/aggregate/physical_window.cpp
+++ b/src/execution/operator/aggregate/physical_window.cpp
@@ -170,7 +170,7 @@ private:
void WindowGlobalSinkState::ResizeGroupingData(idx_t cardinality) {
// Have we started to combine? Then just live with it.
- if (grouping_data && !grouping_data->GetPartitions().empty()) {
+ if (grouping_data) {
return;
}
// Is the average partition size too large? |
I was able to reproduce this locally, and this seems to fix it: void PartitionedColumnData::FlushAppendState(PartitionedColumnDataAppendState &state) {
for (idx_t i = 0; i < state.partition_buffers.size(); i++) {
auto &partition_buffer = *state.partition_buffers[i];
if (partition_buffer.size() > 0) {
partitions[i]->Append(partition_buffer);
partition_buffer.Reset(); // Add this line!
}
}
} The buffer needed to be reset. Nowhere else in the codebase did we use the append state after flushing it, so this was not a problem before. |
Fix reentry problem in PartitionedColumnData::FlushAppendState. Make stress test more brutal and reliable.
…into window-pct-range
There is no need to copy the old partitioned data when repartitioning.
But if we have started to combine, then repartitioning will duplicate data,
so don't resize then.