Skip to content

Conversation

Mytherin
Copy link
Collaborator

Fixes #9308

When an ANTI join is performed on a condition that is proven to be true for every row in the data set the result is the empty set ONLY when the RHS is not empty. The current statistics propagation optimizer always returns the empty set which is incorrect if the RHS is empty. This PR removes this specific optimization for anti joins.

@Mytherin Mytherin merged commit 702d22f into duckdb:main Nov 13, 2023
@Mytherin Mytherin deleted the issue9308 branch December 4, 2023 11:45
Mytherin added a commit that referenced this pull request Jun 30, 2025
Closes #17417 

This issue appears to actually be a regression of issue #9308 (with the
existing test for that issue not picking up the regression as the
optimizer no longer makes an explicit ANTI-JOIN for the WHERE NOT EXISTS
query).

While the fix in #9654 could be re-applied (and if this PR is deemed not
suitable, should be) to remove the pruning optimization altogether for
ANTI-JOINS with always true filters, this PR aims to keep the
optimization and handle the case where the join is on an empty result
set.

In particular, this PR adds a helper function to BaseStatistics to
determine whether the stats match an empty result set (using the min/max
set by CreateEmpty) and correctly returns the LHS of the ANTI-JOIN if
the RHS is empty (and keeps the current behaviour of returning an empty
result otherwise)

While this function is only currently used for NumericStats, I've also
implemented it for the other BaseStatistics implementations, to avoid
this being regressed again in future if statistics propagation (and in
particular, filter pruning) occur for non-numeric types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wrong result: NOT EXISTS predicate with correlated non-equality comparison
1 participant