-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix statistics propagation for anti-joins on empty tables #17439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix statistics propagation for anti-joins on empty tables #17439
Conversation
68ab1ce
to
22fabcc
Compare
Thanks for your fixing! |
22fabcc
to
5891777
Compare
Dear Developers, thanks for your fixing! Could you please review this pull request and merge it? Thanks! I am currently working on extensively testing DuckDB, and I also want to express my gratitude to your fixing! |
also chiming in with enthusiasm for this being merged, and thanks to the dev who fixed it!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think you can change the base branch to v1.3-Ossivalis so that this gets merged into the next bug fix release
Thanks for the review, changed the base branch to v1.3-Ossivalis! |
Thanks! |
fix statistics propagation for anti-joins on empty tables (duckdb/duckdb#17439) Run Python workflow against both Python 3.9 and 3.13 on PR to ensure … (duckdb/duckdb#18080) Main.yml: Move very long job from debug to release with `-DDEBUG` and `FORCE_ASSERT` (duckdb/duckdb#18081)
fix statistics propagation for anti-joins on empty tables (duckdb/duckdb#17439) Run Python workflow against both Python 3.9 and 3.13 on PR to ensure … (duckdb/duckdb#18080) Main.yml: Move very long job from debug to release with `-DDEBUG` and `FORCE_ASSERT` (duckdb/duckdb#18081) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Closes #17417
This issue appears to actually be a regression of issue #9308 (with the existing test for that issue not picking up the regression as the optimizer no longer makes an explicit ANTI-JOIN for the WHERE NOT EXISTS query).
While the fix in #9654 could be re-applied (and if this PR is deemed not suitable, should be) to remove the pruning optimization altogether for ANTI-JOINS with always true filters, this PR aims to keep the optimization and handle the case where the join is on an empty result set.
In particular, this PR adds a helper function to BaseStatistics to determine whether the stats match an empty result set (using the min/max set by CreateEmpty) and correctly returns the LHS of the ANTI-JOIN if the RHS is empty (and keeps the current behaviour of returning an empty result otherwise)
While this function is only currently used for NumericStats, I've also implemented it for the other BaseStatistics implementations, to avoid this being regressed again in future if statistics propagation (and in particular, filter pruning) occur for non-numeric types.