Issue 5984 #4 LogicalColumnIndex out of range Error #6303
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses issue 5984 number 4. When you have a non-reorder able join (i.e Left or Right Join), it's possible the incorrect logical get is grabbed when gathering the table statistics to determine the join order. Currently duckdb just grabs the leftmost logical get (oops). After grabbing this leftmost logical get, we get statistics for the column that is used in the join condition. This is where the problem arises, if we grab the wrong logical get (i.e the left get), and a column from the right side of the non-reorderable join is used in the join condition, we end up getting the wrong column. This isn't horrible as our join order optimiser hasn't chosen any drastically bad plans. The error occurs when the number of columns in the leftmost logical get is less than the number of columns in the right logical get (the logical get we should have gotten). If the join condition is on column index 5, and the leftmost logical get only has 2 columns, duckdb will throw a logical index out of range error.
To fix this we just pass down the table_index of the LogicalGet that we want and we check the table_index of the LogicalGet when we call
GetLogicalGet
.I also renamed count to distinct_count, as that is a more descriptive variable name for the number of distinct values in a column.