Use correct output column id list for query processing #195

mkaruza · 2024-09-19T09:49:35Z

When reading postgres tables we need to know which columns need to to
written to output vector. Based on filtering and query we need to
either get this information from projection_ids or columns_ids
list. Projection ids list will be used when that are fetched columns
from table but those columns are not needed in further query processing.
Otherwise columns_ids list will be used.
Fixes Equality comparison with varchar crashes #190

test/regression/sql/query_filter.sql

src/scan/postgres_scan.cpp

include/pgduckdb/scan/postgres_scan.hpp

src/scan/postgres_scan.cpp

JelteF

Overall I think this looks pretty good.

JelteF · 2024-09-20T13:06:05Z

src/pgduckdb_types.cpp


 	bool valid_tuple = true;

-	for (auto const &[columnIdx, valueIdx] : scan_global_state->m_columns) {
+	/* First we are fetching all required columns oredered by column id


Suggested change

/* First we are fetching all required columns oredered by column id

/* First we are fetching all required columns ordered by column id

JelteF · 2024-09-20T13:11:31Z

test/regression/expected/query_filter.out

+-- Column ids list used because both of fetched column are used after scan
+SELECT a, b FROM query_filter_output_column WHERE b = 't1';


From your comments and my closer reading of the relevant duckdb code, we are expected to change the handle the column order changes. So let's include that in one of these tests

Suggested change

-- Column ids list used because both of fetched column are used after scan

SELECT a, b FROM query_filter_output_column WHERE b = 't1';

-- Column ids list used because both of fetched column are used after scan, we also should swap column order correctly

SELECT b, a FROM query_filter_output_column WHERE b = 't1';

Let's also add a test where we only re-order columns, but still output all of them.

src/scan/postgres_scan.cpp

The base branch was changed.

* When reading postgres tables we need to know which columns need to to written to output vector. Based on filtering and query we need to either get this information from `projection_ids` or `columns_ids` list. Projection ids list will be used when that are fetched columns from table but those columns are not needed in further query processing. Otherwise columns_ids list will be used.

Tishj · 2024-09-23T14:10:22Z

I get this crash on main, likely related to this PR?

➜  pg_duckdb git:(main) ✗ psql postgres
 pg_backend_pid 
----------------
          29479
(1 row)

DROP EXTENSION
psql:/Users/thijs/.psqlrc:3: WARNING:  To actually execute queries using DuckDB you need to run "SET duckdb.execution TO true;"
CREATE EXTENSION
SET
psql (16.0)
Type "help" for help.

postgres=# SELECT a, c FROM query_filter_output_column WHERE b = 't1';
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?>

Tishj · 2024-09-23T14:26:49Z

e4b914a (the parent of this PR) does not have this issue

This PR fixes the issue mentioned in #195 (comment) Fix found by Mario. I don't think we quite understand why this caused a crash/bad access violation on Mac but Linux was unaffected. either way this is now fixed Some debugging: ``` (lldb) p scan_global_state->m_output_columns_ids.size() (std::map<unsigned long long, unsigned long long>::size_type) 2 (lldb) p scan_global_state->m_read_columns_ids.size() (std::map<unsigned long long, unsigned long long>::size_type) 3 ``` I imagine this wrote to memory we do own, and is managed by duckdb, containing a pointer value that got overwritten, when it gets dereferenced it segfaults because the address is bogus The difference in system allocators likely caused this problem to fly under the radar on Linux

mkaruza requested a review from JelteF September 19, 2024 09:49

JelteF requested changes Sep 19, 2024

View reviewed changes

test/regression/sql/query_filter.sql Outdated Show resolved Hide resolved

src/scan/postgres_scan.cpp Show resolved Hide resolved

include/pgduckdb/scan/postgres_scan.hpp Outdated Show resolved Hide resolved

JelteF reviewed Sep 19, 2024

View reviewed changes

src/scan/postgres_scan.cpp Show resolved Hide resolved

src/scan/postgres_scan.cpp Show resolved Hide resolved

mkaruza force-pushed the float-filter-op branch 2 times, most recently from b1ebe29 to 49f9c50 Compare September 20, 2024 09:35

mkaruza force-pushed the str-filter-op branch from 618277c to 7303512 Compare September 20, 2024 12:59

mkaruza changed the title ~~Handle COUNT(*) with VARCHAR filter QUERY~~ Use correct output column id list for query processing Sep 20, 2024

JelteF previously approved these changes Sep 20, 2024

View reviewed changes

Base automatically changed from float-filter-op to main September 20, 2024 14:05

mkaruza added 2 commits September 20, 2024 16:11

Review feedback changes

689cec9

mkaruza force-pushed the str-filter-op branch from 7303512 to 689cec9 Compare September 20, 2024 14:19

mkaruza requested a review from JelteF September 20, 2024 14:21

JelteF approved these changes Sep 20, 2024

View reviewed changes

mkaruza merged commit b3d8315 into main Sep 20, 2024
3 checks passed

mkaruza deleted the str-filter-op branch September 20, 2024 14:29

This was referenced Sep 24, 2024

Add the PostgresStorageExtension to DuckDB #97

Merged

Allocate for enough data in the 'values' Datum array #213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use correct output column id list for query processing #195

Use correct output column id list for query processing #195

Uh oh!

mkaruza commented Sep 19, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JelteF left a comment

Uh oh!

JelteF Sep 20, 2024

Uh oh!

JelteF Sep 20, 2024

Uh oh!

JelteF Sep 20, 2024

Uh oh!

Uh oh!

Uh oh!

Tishj commented Sep 23, 2024

Uh oh!

Tishj commented Sep 23, 2024

Uh oh!

Uh oh!

	/* First we are fetching all required columns oredered by column id
	/* First we are fetching all required columns ordered by column id

		-- Column ids list used because both of fetched column are used after scan
		SELECT a, b FROM query_filter_output_column WHERE b = 't1';

Use correct output column id list for query processing #195

Use correct output column id list for query processing #195

Uh oh!

Conversation

mkaruza commented Sep 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JelteF left a comment

Choose a reason for hiding this comment

Uh oh!

JelteF Sep 20, 2024

Choose a reason for hiding this comment

Uh oh!

JelteF Sep 20, 2024

Choose a reason for hiding this comment

Uh oh!

JelteF Sep 20, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Tishj commented Sep 23, 2024

Uh oh!

Tishj commented Sep 23, 2024

Uh oh!

Uh oh!

mkaruza commented Sep 19, 2024 •

edited

Loading