Skip to content

Conversation

Tmonster
Copy link
Contributor

Fixes https://github.com/duckdblabs/duckdb-internal/issues/2979

The original plan with no optimization looks like this

┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             c             │
│                           │
│          ~0 Rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│    ((y + z) > CAST(0 AS   │
│          INTEGER))        │
│                           │
│          ~0 Rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             y             │
│             z             │
│                           │
│          ~0 Rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           UNNEST          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       CROSS_PRODUCT       ├──────────────┐
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│             x             ││             y             │
│                           ││                           │
│          ~1 Rows          ││          ~1 Rows          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         DUMMY_SCAN        ││         DUMMY_SCAN        │
└───────────────────────────┘└───────────────────────────┘

The UNNEST has an index, and is a blocking operator (I think?). The filter above it is extracted so that it can be used as a potential join filter. The problem here is that the filter has bindings from the unnest, so when we attempt to use it to join the two tables, the join order optimizer will throw an error because the filter bindings (which are only available above the unnest) are not available below the unnest.

The solution I have for this is to optimize everything below the unnest separately. In the future I may want to implement a pushdown/pullup unnest optimizer to find the best place to put an unnest.

@Mytherin Mytherin merged commit b875459 into duckdb:main Sep 12, 2024
39 checks passed
@Mytherin
Copy link
Collaborator

Thanks! LGTM

The UNNEST has an index, and is a blocking operator (I think?).

UNNEST is a bit like a window operator in that it leaves the original columns available, but adds new columns. It is not a blocking operator but a streaming one. Differently from Window it can change the cardinalities as well based on the list lengths (usually increases the cardinality). I think you probably always want to execute UNNEST as late as possible for that reason - but I could be mistaken.

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 25, 2024
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Sep 25, 2024
13810 unnest cross join error (duckdb/duckdb#13878)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants