Skip to content

Conversation

mkaruza
Copy link
Contributor

@mkaruza mkaruza commented Sep 19, 2024

  • When reading postgres tables we need to know which columns need to to
    written to output vector. Based on filtering and query we need to
    either get this information from projection_ids or columns_ids
    list. Projection ids list will be used when that are fetched columns
    from table but those columns are not needed in further query processing.
    Otherwise columns_ids list will be used.
  • Fixes Equality comparison with varchar crashes #190

@mkaruza mkaruza requested a review from JelteF September 19, 2024 09:49
@mkaruza mkaruza force-pushed the float-filter-op branch 2 times, most recently from b1ebe29 to 49f9c50 Compare September 20, 2024 09:35
@mkaruza mkaruza changed the title Handle COUNT(*) with VARCHAR filter QUERY Use correct output column id list for query processing Sep 20, 2024
JelteF
JelteF previously approved these changes Sep 20, 2024
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks pretty good.


bool valid_tuple = true;

for (auto const &[columnIdx, valueIdx] : scan_global_state->m_columns) {
/* First we are fetching all required columns oredered by column id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* First we are fetching all required columns oredered by column id
/* First we are fetching all required columns ordered by column id

Comment on lines +62 to +63
-- Column ids list used because both of fetched column are used after scan
SELECT a, b FROM query_filter_output_column WHERE b = 't1';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your comments and my closer reading of the relevant duckdb code, we are expected to change the handle the column order changes. So let's include that in one of these tests

Suggested change
-- Column ids list used because both of fetched column are used after scan
SELECT a, b FROM query_filter_output_column WHERE b = 't1';
-- Column ids list used because both of fetched column are used after scan, we also should swap column order correctly
SELECT b, a FROM query_filter_output_column WHERE b = 't1';

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a test where we only re-order columns, but still output all of them.

Base automatically changed from float-filter-op to main September 20, 2024 14:05
@mkaruza mkaruza dismissed JelteF’s stale review September 20, 2024 14:05

The base branch was changed.

* When reading postgres tables we need to know which columns need to to
  written to output vector. Based on filtering and query we need to
  either get this information from `projection_ids` or `columns_ids`
  list. Projection ids list will be used when that are fetched columns
  from table but those columns are not needed in further query processing.
  Otherwise columns_ids list will be used.
@mkaruza mkaruza merged commit b3d8315 into main Sep 20, 2024
3 checks passed
@mkaruza mkaruza deleted the str-filter-op branch September 20, 2024 14:29
@Tishj
Copy link
Collaborator

Tishj commented Sep 23, 2024

I get this crash on main, likely related to this PR?

➜  pg_duckdb git:(main) ✗ psql postgres
 pg_backend_pid 
----------------
          29479
(1 row)

DROP EXTENSION
psql:/Users/thijs/.psqlrc:3: WARNING:  To actually execute queries using DuckDB you need to run "SET duckdb.execution TO true;"
CREATE EXTENSION
SET
psql (16.0)
Type "help" for help.

postgres=# SELECT a, c FROM query_filter_output_column WHERE b = 't1';
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> 

@Tishj
Copy link
Collaborator

Tishj commented Sep 23, 2024

e4b914a (the parent of this PR) does not have this issue

Tishj added a commit that referenced this pull request Sep 26, 2024
This PR fixes the issue mentioned in
#195 (comment)

Fix found by Mario.

I don't think we quite understand why this caused a crash/bad access
violation on Mac but Linux was unaffected.
either way this is now fixed

Some debugging:
```
(lldb) p scan_global_state->m_output_columns_ids.size()
(std::map<unsigned long long, unsigned long long>::size_type) 2
(lldb) p scan_global_state->m_read_columns_ids.size()
(std::map<unsigned long long, unsigned long long>::size_type) 3
```

I imagine this wrote to memory we do own, and is managed by duckdb,
containing a pointer value that got overwritten, when it gets
dereferenced it segfaults because the address is bogus
The difference in system allocators likely caused this problem to fly
under the radar on Linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Equality comparison with varchar crashes
3 participants