Skip to content

Conversation

kryonix
Copy link
Contributor

@kryonix kryonix commented Aug 13, 2024

Without this PR, DuckDB has a few issues related to correlated recursive CTEs.


First, planning of (deeply) nested recursive CTEs failed:

SELECT *
FROM generate_series(1,3) AS "input"("n"),
LATERAL
(WITH RECURSIVE "subquery1"("x") AS (
  (SELECT "n", true AS "exists")
    UNION ALL
  (WITH "subquery" AS (SELECT * FROM "subquery1")
  SELECT "x" - 1, (SELECT NOT EXISTS (SELECT 1 FROM "subquery" AS "input"("a") WHERE "a" > _."x"))
  FROM "subquery" AS _("x") WHERE "x" > 0)
  ) table "subquery1")
ORDER BY "n", "x"
;

Second, execution of correlated recursive CTEs, containing NOT EXISTS subqueries could lead to wrong results:

SELECT *
FROM generate_series(1,3) AS "input"("n"),
LATERAL
(WITH RECURSIVE "subquery"("x") AS (
  (SELECT "n", true AS "exists")
    UNION ALL
  SELECT "x" - 1, NOT EXISTS (SELECT 1 FROM "subquery" AS "input"("x") WHERE "x" > _."x") FROM "subquery" AS _("x") WHERE "x" > 0
  ) table "subquery")
ORDER BY "n"
;

The issue is that the inner NOT EXISTS subquery has an implicit correlation to the outer correlation "input"."n". This was not taken into consideration. This PR fixes this by adding an appropriate filter predicate like this:

SELECT *
FROM
(WITH RECURSIVE "subquery"("x", "n") AS (
  (SELECT "n", "n", true AS "exists" FROM generate_series(1,3) AS "input"("n"))
    UNION ALL
  (WITH "subquery" AS (select * from "subquery")
  SELECT "n" - 1, "n", NOT EXISTS (SELECT 1 FROM "subquery" AS "input"("x") WHERE "x" > "n" AND "n" = "corr_n") FROM "subquery" AS _("n", "corr_n") WHERE "n" > 0)
                                                                                             -- ^^^^^^^^^^^^^^
  ) table "subquery")
;

Currently, there is an issue with test/sql/cte/test_fib_complex.test_slow when executed by a debug build. The FILTER_PUSHDOWN optimizer somehow creates the following binder error:

INTERNAL Error: Failed to bind column reference "SUBQUERY" [250.0] (bindings: {#[123.0], #[123.1], #[123.2], #[123.3], #[123.4], #[123.5], #[143.0]})

I have not found a solution yet for that problem. Therefore, this PR is not ready for merge.

This has been fixed. I think the issue was a continuation of #12916.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 13, 2024 13:47
@kryonix kryonix marked this pull request as ready for review August 14, 2024 20:55
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 15, 2024 06:13
@kryonix kryonix marked this pull request as ready for review August 15, 2024 06:14
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 15, 2024 06:23
@kryonix kryonix marked this pull request as ready for review August 15, 2024 06:23
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 15, 2024 08:49
@carlopi
Copy link
Contributor

carlopi commented Aug 25, 2024

@kryonix: is this ready to be reviewed ? If so, can you undraft and ask for a review?

@kryonix
Copy link
Contributor Author

kryonix commented Aug 25, 2024

Unfortunately it is not ready yet, there are some tricky corner cases left that I am trying to fix currently. This issue turned out to be quite tricky to solve.

@kryonix kryonix force-pushed the fix_cte_decorrelation branch from b0d698b to d707ff7 Compare September 23, 2024 06:22
@kryonix kryonix marked this pull request as ready for review September 23, 2024 06:26
@kryonix
Copy link
Contributor Author

kryonix commented Sep 23, 2024

Turns out, part of the remaining issue I was searching was caused by #13873, and has been fixed.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 23, 2024 06:31
@kryonix kryonix marked this pull request as ready for review September 23, 2024 06:31
@carlopi
Copy link
Contributor

carlopi commented Sep 23, 2024

@kryonix: CI is raising a couple of minor warnings turned errors that needs fixing:

/home/runner/work/duckdb/duckdb/src/planner/subquery/rewrite_subquery.cpp:69:9: error: unused variable ‘join’ [-Werror=unused-variable]
   69 |   auto &join = op.Cast<LogicalDependentJoin>();
      |         ^~~~
/home/runner/work/duckdb/duckdb/src/planner/subquery/rewrite_subquery.cpp:42:13: error: loop variable is copied but only used as const reference; consider making it a const reference [performance-for-range-copy,-warnings-as-errors]
   42 |                 for (auto col : join.correlated_columns) {
      |                           ^
      |                      const  &

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 23, 2024 07:46
@kryonix
Copy link
Contributor Author

kryonix commented Sep 23, 2024

This PR should now be relatively clean and tidy, and technically does what it is supposed to do. However, test case test/sql/cte/test_fib_complex.test_slow still does not work as intended. I'm not really sure what is wrong with the plan in that case. But the SUM-Aggregate computes the wrong result as long as CTE "fib.wait.1.decide" is NOT MATERIALIZED. Unfortunately I was not able (yet) to find a smaller example that triggers the same bug.

@kryonix kryonix marked this pull request as ready for review September 23, 2024 13:45
@kryonix
Copy link
Contributor Author

kryonix commented Apr 29, 2025

Closing in favor of #17294.

@kryonix kryonix closed this Apr 29, 2025
Mytherin added a commit that referenced this pull request Apr 30, 2025
This PR implements the recently published version of @neumannt's query
unnesting strategy _"Improving Unnesting of Complex Queries"_—which is
an updated version of _"Unnesting Arbitrary Queries"_.

**Why do we want this?** Separation of concerns and phase separation.
With bottom-up decorrelation, we never fully algebraize plans containing
(nested) subqueries into a logical plan—possibly containing
`DEPENDENT_JOINS`. This results in problems trying to decorrelate
complex SQL constructs such as recursive CTEs (see #13404). With this
PR, that changes. Subqueries are fully algebraized, before entering a
single decorrelation pass. This potentially provides optimization
opportunities, that would not have been possible before.

----

Fix: #15307
Fix: #15483
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants