-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix decorrelation of nested correlated recursive CTEs #13404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@kryonix: is this ready to be reviewed ? If so, can you undraft and ask for a review? |
Unfortunately it is not ready yet, there are some tricky corner cases left that I am trying to fix currently. This issue turned out to be quite tricky to solve. |
Instead of iterating all CTE ids in FlattenDependentJoin, hence repeatedly traversing the propably deeply nested subquery structure, the CTE iteration happens in RewriteSubquery.
The expression traversal in the logical projection case was not working properly. Instead of checking the children of expressions, the expression itself got checked over and over again. Also, logical delim mark joins were not checked.
b0d698b
to
d707ff7
Compare
Turns out, part of the remaining issue I was searching was caused by #13873, and has been fixed. |
@kryonix: CI is raising a couple of minor warnings turned errors that needs fixing:
|
This PR should now be relatively clean and tidy, and technically does what it is supposed to do. However, test case |
Closing in favor of #17294. |
This PR implements the recently published version of @neumannt's query unnesting strategy _"Improving Unnesting of Complex Queries"_—which is an updated version of _"Unnesting Arbitrary Queries"_. **Why do we want this?** Separation of concerns and phase separation. With bottom-up decorrelation, we never fully algebraize plans containing (nested) subqueries into a logical plan—possibly containing `DEPENDENT_JOINS`. This results in problems trying to decorrelate complex SQL constructs such as recursive CTEs (see #13404). With this PR, that changes. Subqueries are fully algebraized, before entering a single decorrelation pass. This potentially provides optimization opportunities, that would not have been possible before. ---- Fix: #15307 Fix: #15483
Without this PR, DuckDB has a few issues related to correlated recursive CTEs.
First, planning of (deeply) nested recursive CTEs failed:
Second, execution of correlated recursive CTEs, containing
NOT EXISTS
subqueries could lead to wrong results:The issue is that the inner
NOT EXISTS
subquery has an implicit correlation to the outer correlation"input"."n"
. This was not taken into consideration. This PR fixes this by adding an appropriate filter predicate like this:Currently, there is an issue with
test/sql/cte/test_fib_complex.test_slow
when executed by a debug build. TheFILTER_PUSHDOWN
optimizer somehow creates the following binder error:I have not found a solution yet for that problem. Therefore, this PR is not ready for merge.This has been fixed. I think the issue was a continuation of #12916.