-
Notifications
You must be signed in to change notification settings - Fork 125
Use correct output column id list for query processing #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b1ebe29
to
49f9c50
Compare
618277c
to
7303512
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks pretty good.
src/pgduckdb_types.cpp
Outdated
|
||
bool valid_tuple = true; | ||
|
||
for (auto const &[columnIdx, valueIdx] : scan_global_state->m_columns) { | ||
/* First we are fetching all required columns oredered by column id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* First we are fetching all required columns oredered by column id | |
/* First we are fetching all required columns ordered by column id |
-- Column ids list used because both of fetched column are used after scan | ||
SELECT a, b FROM query_filter_output_column WHERE b = 't1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From your comments and my closer reading of the relevant duckdb code, we are expected to change the handle the column order changes. So let's include that in one of these tests
-- Column ids list used because both of fetched column are used after scan | |
SELECT a, b FROM query_filter_output_column WHERE b = 't1'; | |
-- Column ids list used because both of fetched column are used after scan, we also should swap column order correctly | |
SELECT b, a FROM query_filter_output_column WHERE b = 't1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a test where we only re-order columns, but still output all of them.
* When reading postgres tables we need to know which columns need to to written to output vector. Based on filtering and query we need to either get this information from `projection_ids` or `columns_ids` list. Projection ids list will be used when that are fetched columns from table but those columns are not needed in further query processing. Otherwise columns_ids list will be used.
7303512
to
689cec9
Compare
I get this crash on
|
e4b914a (the parent of this PR) does not have this issue |
This PR fixes the issue mentioned in #195 (comment) Fix found by Mario. I don't think we quite understand why this caused a crash/bad access violation on Mac but Linux was unaffected. either way this is now fixed Some debugging: ``` (lldb) p scan_global_state->m_output_columns_ids.size() (std::map<unsigned long long, unsigned long long>::size_type) 2 (lldb) p scan_global_state->m_read_columns_ids.size() (std::map<unsigned long long, unsigned long long>::size_type) 3 ``` I imagine this wrote to memory we do own, and is managed by duckdb, containing a pointer value that got overwritten, when it gets dereferenced it segfaults because the address is bogus The difference in system allocators likely caused this problem to fly under the radar on Linux
written to output vector. Based on filtering and query we need to
either get this information from
projection_ids
orcolumns_ids
list. Projection ids list will be used when that are fetched columns
from table but those columns are not needed in further query processing.
Otherwise columns_ids list will be used.