Avoid going too in-depth while computing join order #17904

carlopi · 2025-06-12T12:43:55Z

Visit in-depth up to max_expression_depth recursion levels (in the plan tree), and stop keeping original plans for deeper nodes. This makes so that changing max_expression_depth will make possible to control the complexity of optimizations performed.
Clients such as duckdb-wasm that might have a premium on stack space (due to 32bit nature) might lower the setting to avoid incurring in stackoverflow-connected problems.

Weaker point is that at the moment is hard to test this, it has been done manually by changing the code like:

if (depth > 10)
    return plan;

but it's hard to do from the system as such, and that might be a code smell.

Fixes duckdb/duckdb-wasm#1954 and other similar issues (more visible in duckdb-wasm but somehow still relevant in native).

carlopi · 2025-06-12T12:47:00Z

One problem might be that I am not sure if the added C++ test is too demanding say on Windows. Let's see. I can also just remove it.

Cutoff to 25 is arbitrary, in duckdb-wasm with stack of 64KB it would have gone up to 80+ (depending on implementation details / compiler optimization level), but high enough that should not really influence actual plans. There might be better solution, and cutoff might even be made customizable, so that different situation might warrant different behaviours (and better testing). This has been tested bailing out at level 3, just to check system could handle bailing out. Test is added that concatenates `select 1 as num` with 500 times `union all select 1 as num`. Both with and without this PR the test can be passed, but adding a limit on depth significantly increases the reachable depth. Experimenting with higher numbers (together with `SET max_expression_depth = x` to actually have the statement be parseable), will show a path to further improve in this area.

Tmonster

looks good to me for the bug fix. We should come back to this before v1.3.2/v1.4.0. Ideally we can optimize the plans closer to the leafs of the tree instead of from the top down.

src/optimizer/join_order/join_order_optimizer.cpp

carlopi · 2025-06-12T14:26:19Z

One observation, for the case of UNION ALL, is that it might be re-grouped from A union all B union all C union all D to (A union all B) union all (C union all D), and that should keep semantics (originally noted by @Mytherin).

After some more thoughts, I think preventing going too much in depth might make sense in any case, while we might want to reorder the plans before getting there.

Mytherin · 2025-06-12T16:26:01Z

Let's put this on 50 at least? Or maybe 100 and use a different value for WASM? 25 seems very "reachable" and might randomly degrade actual queries people are running.

abramk · 2025-06-12T23:16:15Z

@carlopi I have some data files and 250 queries (these are randomly generated and obnoxiously complicated with deep joins and what not) that I thought I'd pass along - in case it helps you with this PR. The datafiles have a number of duckdb database files and a start.sql script that attaches the files with specific names.
queries.zip

carlopi · 2025-06-13T04:43:24Z

@carlopi I have some data files and 250 queries (these are randomly generated and obnoxiously complicated with deep joins and what not) that I thought I'd pass along - in case it helps you with this PR. The datafiles have a number of duckdb database files and a start.sql script that attaches the files with specific names. queries.zip

Amazing! I am not sure I will get them to run on this, mostly since they are not trivial to plug-in our test infrastructure, but would maybe make sense to expose those as a sqllogic tests? That would make it easier to get access to try them. They could even become a test-only extension, then in development it's trivial plug them, and there can be a data folder + a test folder. Maybe this could be moved to a discussion, I think it would be nice to have a path to provide easy to plug set of tests.

carlopi · 2025-06-13T04:53:45Z

Let's put this on 50 at least? Or maybe 100 and use a different value for WASM? 25 seems very "reachable" and might randomly degrade actual queries people are running.

I would then probably prefer 50 everywhere OR making this depending on variable that can be set by users/clients. 100 + override Wasm side also fine.
Maybe it's handy to check in live with @Tmonster and you on the balance between adding complexity (say making it externally tuneable by users) given timeline.

Meanwhile I am running our test and benchmark suite with additional logging to detect very deep plans, I have found outliers are:

test	total number of nodes	max depth of the tree
test/issues/general/test_15483.test	101	22
A lot of unions	1496	500

I will update when more results came in, for example regression is run at https://github.com/carlopi/duckdb/actions/runs/15626527270/job/44021730739

carlopi · 2025-06-13T08:41:38Z

I ended up setting to 100 the threshold, and raising the stack size in Wasm.

Those combined make so that:

tall plans in Wasm will not cause problem
same behaviour of the optimizer between Wasm and native
incredibly tall plans (4000+ depth is currently required on my machine) will not cause problem in native

I will double check properly Wasm size again, but I think with 100 we should good for now, while this can be looked again for 1.4 with more time.

carlopi · 2025-06-13T10:46:28Z

Discussed with @Mytherin, looking into changing from fixed value to max_expression_depth, so that it's centralized and available to users (or client) to tune it.

carlopi · 2025-06-13T11:18:04Z

Changed to use config's max_expression_depth, unsure if another configuration would not make more sense, given testing bailing out is not really easy given bail out will now happen at parser level.

Mytherin · 2025-06-27T12:26:24Z

Thanks!

Avoid going too in-depth while computing join order (duckdb/duckdb#17904)

Avoid going too in-depth while computing join order (duckdb/duckdb#17904) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>

carlopi requested a review from Tmonster June 12, 2025 12:43

carlopi marked this pull request as draft June 12, 2025 12:45

carlopi force-pushed the avoid_deep_recursion branch from bf67662 to 25d111e Compare June 12, 2025 12:46

carlopi marked this pull request as ready for review June 12, 2025 12:46

carlopi marked this pull request as draft June 12, 2025 12:50

carlopi force-pushed the avoid_deep_recursion branch from 25d111e to c43444f Compare June 12, 2025 12:50

carlopi marked this pull request as ready for review June 12, 2025 12:51

Tmonster approved these changes Jun 12, 2025

View reviewed changes

src/optimizer/join_order/join_order_optimizer.cpp Outdated Show resolved Hide resolved

Handle review from @Tmonster and fix format

c607a2d

carlopi marked this pull request as draft June 12, 2025 13:02

carlopi marked this pull request as ready for review June 12, 2025 13:02

carlopi added the Ready For Review label Jun 12, 2025

carlopi changed the title ~~Avoid going to in-depth while computing join order~~ Avoid going too in-depth while computing join order Jun 12, 2025

Move THRESHOLD_AVOID_IN_DEPTH_VISIT to 100

1dbb3c0

duckdb-draftbot marked this pull request as draft June 13, 2025 06:20

carlopi marked this pull request as ready for review June 13, 2025 06:21

Avoid std::move, fix amalgamation

517a06d

duckdb-draftbot marked this pull request as draft June 13, 2025 07:56

carlopi marked this pull request as ready for review June 13, 2025 08:21

carlopi added Changes Requested and removed Ready For Review labels Jun 13, 2025

carlopi marked this pull request as draft June 13, 2025 10:48

Use max_expression_depth to determine the maximum depth

9590c3d

carlopi marked this pull request as ready for review June 13, 2025 11:07

carlopi added Ready For Review and removed Changes Requested labels Jun 13, 2025

carlopi mentioned this pull request Jun 16, 2025

Lots of UNION ALL causes "memory access out of bounds" duckdb/duckdb-wasm#1954

Closed

Mytherin merged commit 8cab422 into duckdb:v1.3-ossivalis Jun 27, 2025
52 checks passed

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Jun 28, 2025

vendor: Update vendored sources to duckdb/duckdb@8cab422

358fc3a

Avoid going too in-depth while computing join order (duckdb/duckdb#17904)

github-actions bot mentioned this pull request Jun 28, 2025

vendor: Update vendored sources to duckdb/duckdb@8cab422a70dc74ed2943dcf75e9e91721a07e48e duckdb/duckdb-r#1315

Merged

carlopi deleted the avoid_deep_recursion branch August 17, 2025 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid going too in-depth while computing join order #17904

Avoid going too in-depth while computing join order #17904

Uh oh!

carlopi commented Jun 12, 2025 •

edited

Loading

Uh oh!

carlopi commented Jun 12, 2025

Uh oh!

Tmonster left a comment

Uh oh!

Uh oh!

carlopi commented Jun 12, 2025

Uh oh!

Mytherin commented Jun 12, 2025

Uh oh!

abramk commented Jun 12, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

Uh oh!

Mytherin commented Jun 27, 2025

Uh oh!

Uh oh!

Avoid going too in-depth while computing join order #17904

Avoid going too in-depth while computing join order #17904

Uh oh!

Conversation

carlopi commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlopi commented Jun 12, 2025

Uh oh!

Tmonster left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carlopi commented Jun 12, 2025

Uh oh!

Mytherin commented Jun 12, 2025

Uh oh!

abramk commented Jun 12, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

carlopi commented Jun 13, 2025

Uh oh!

Uh oh!

Mytherin commented Jun 27, 2025

Uh oh!

Uh oh!

carlopi commented Jun 12, 2025 •

edited

Loading