-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Label mark joins with convert to semi #12916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label mark joins with convert to semi #12916
Conversation
…pushdowns, this way new FilterPushdown objects will not incorrectly convert mark joins that have projected mark indexes later on
…function call at the very start
…th_convert_to_semi
…th_convert_to_semi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have one comment:
auto &expr = proj.expressions.at(col_index); | ||
vector<ColumnBinding> bindings_to_keep; | ||
ExpressionIterator::EnumerateExpression(expr, [&](Expression &child) { | ||
if (expr->expression_class == ExpressionClass::BOUND_COLUMN_REF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work if the top-level expression is not a column ref, but it refers to one or more column refs?
For example:
SELECT col0 AND col1 FROM (...);
The mark join colref boolean can be used in the remainder of the plan like so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnumerateExpression should also enumerate into the children, so the mark join colref should still be picked up here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I was worried that referring to columns multiple times would mess things up in queries like this:
SELECT col0, col0 AND col1 FROM (...);
but they are deduplicated by the unordered_set
, so that should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we perhaps in a follow-up PR rename EnumerateExpression
to EnumerateAllExpressions
? The difference between EnumerateExpression
and EnumerateChildren
is otherwise easy to miss when reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to go from my side
Thanks! |
Merge pull request duckdb/duckdb#12916 from Tmonster/label_mark_joins_with_convert_to_semi
Continuation of https://github.com/duckdb/duckdb/pull/12828/files
closes #11042
The discussion on PR 12828, led to making a new optimizer to prevent mark to semi join conversions. Although the chance is very small, I was worried people could turn off the NO_MARK_TO_SEMI_OPTIMIZER but not the filter pushdown optimizer. This would mean the bug could persist if another optimizer was added to fix this.
Now in optimizer.cpp, once the filter pushdown object is created,
filter_pushdown.CheckMarkToSemi(*plan, top_bindings);
is called. This is only needs to be called at the root of the plan, keeping the code clean and avoiding extra tree traversals.