Skip to content

Conversation

Damon07
Copy link
Contributor

@Damon07 Damon07 commented Jan 24, 2025

This PR fixes #15316.

Signed-off-by: damon <wangmengdamon@gmail.com>
@Damon07 Damon07 marked this pull request as draft January 24, 2025 10:29
@Damon07 Damon07 marked this pull request as ready for review January 24, 2025 10:29
@szarnyasg szarnyasg deleted the branch duckdb:main January 25, 2025 11:25
@szarnyasg szarnyasg closed this Jan 25, 2025
@Mytherin Mytherin reopened this Jan 25, 2025
@Mytherin Mytherin changed the base branch from v1.2-histrionicus to main January 25, 2025 11:49
@Mytherin Mytherin requested a review from Tmonster January 28, 2025 14:00
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It looks really good. If you want to make it even faster you can try to replace the RHS with a 1 row scan of NULL values and turn the join into a cross product.
Currently a Nested block join is still planned, in certain cases this is even slower than an inner join with matches.

If you plan a cross join with a 1 row table of NULLS, then the performance should be even faster than any hash join. In addition, the relations are more freely moved around for join order optimization as well.

@Damon07
Copy link
Contributor Author

Damon07 commented Jan 29, 2025

Thanks for the PR! It looks really good. If you want to make it even faster you can try to replace the RHS with a 1 row scan of NULL values and turn the join into a cross product. Currently a Nested block join is still planned, in certain cases this is even slower than an inner join with matches.

If you plan a cross join with a 1 row table of NULLS, then the performance should be even faster than any hash join. In addition, the relations are more freely moved around for join order optimization as well.

Thank you, that's a really great suggestion! I'll fix it as you recommended.

@Damon07
Copy link
Contributor Author

Damon07 commented Jan 30, 2025

@Tmonster, I plan to use the LogicalDummyScan to generate "1 row scan of NULL values", but I don't know how to set the table index of the operator. It seems that LogicalDummyScan is not suitable to be used here. So am I on the right way?

@Tmonster
Copy link
Contributor

Tmonster commented Jan 30, 2025

@Tmonster, I plan to use the LogicalDummyScan to generate "1 row scan of NULL values", but I don't know how to set the table index of the operator. It seems that LogicalDummyScan is not suitable to be used here. So am I on the right way?

I think you are on the right path. You have two options I think here.

  1. You can use the binder in the optimizer class to Generate a new binding like they do here. However, you can't use this for the dummy scan since it's possible the RHS is a product of multiple tables and therefore projects expressions with different table indexes. To fix that you will need to use the ColumnBindingReplacer to replace the original RHS columns with new ones from the LogicalDummyScan. You can find an example here.

  2. Create multiple LogicalDummyScans for each table index you find in the bindings of the RHS. Make each scan produce 1 row of NULLS with the correct types and cross product them all together. You can assign the table index from the original RHS tables and the plan will still be valid.

I think 2 might be easier since the column binding replacer process can get confusing, although 1 is faster execution wise since it is only 1 logicalDummy scan instead of multiple extra cross products.

Let me know if this helps 👍

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 2, 2025 00:09
@Damon07
Copy link
Contributor Author

Damon07 commented Feb 2, 2025

@Tmonster, thank you for the great suggestions! They are really useful. I have adopted the second one. I have worked on the first one and it was almost done, but when I applied the ColumnBindingReplacer to the root plan, it failed on null pointer because the processing operator of the plan was moved into the Rewrite function call and meanwhile new FilterPushdowns can be generated with child operator moved in during the process. ColumnBindingReplacer can't process the incomplete plan and it will be a little complicated to work that around so I chosen the second suggestion.

@Damon07 Damon07 marked this pull request as ready for review February 2, 2025 00:32
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for getting back so late. Had to catch up on some other things. Looks good to me! I can understand that the ColumnBindingReplacer might be difficult to deal with a changing plan due to the optimizer. Can you just let me know why there is not a cross product between all of the RHS tables?

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 13, 2025 04:42
@Damon07 Damon07 marked this pull request as ready for review February 13, 2025 04:43
@Damon07
Copy link
Contributor Author

Damon07 commented Feb 13, 2025

Code has updated as comment.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 13, 2025 06:02
@Damon07 Damon07 marked this pull request as ready for review February 13, 2025 06:03
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@Mytherin Mytherin merged commit d4e40ac into duckdb:main Feb 13, 2025
49 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Feb 27, 2025
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Literal 'FALSE' much slower than false condition that triggers predicate pushdown
4 participants