Skip to content

Conversation

Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Feb 15, 2023

This PR addresses issue 5984 number 4. When you have a non-reorder able join (i.e Left or Right Join), it's possible the incorrect logical get is grabbed when gathering the table statistics to determine the join order. Currently duckdb just grabs the leftmost logical get (oops). After grabbing this leftmost logical get, we get statistics for the column that is used in the join condition. This is where the problem arises, if we grab the wrong logical get (i.e the left get), and a column from the right side of the non-reorderable join is used in the join condition, we end up getting the wrong column. This isn't horrible as our join order optimiser hasn't chosen any drastically bad plans. The error occurs when the number of columns in the leftmost logical get is less than the number of columns in the right logical get (the logical get we should have gotten). If the join condition is on column index 5, and the leftmost logical get only has 2 columns, duckdb will throw a logical index out of range error.

To fix this we just pass down the table_index of the LogicalGet that we want and we check the table_index of the LogicalGet when we call GetLogicalGet.
I also renamed count to distinct_count, as that is a more descriptive variable name for the number of distinct values in a column.

@Mytherin Mytherin merged commit 59d02e5 into duckdb:feature Feb 16, 2023
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants