Skip to content

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Feb 7, 2025

This PR fixes #16094

First this was using global_columns, this list of columns is what the Reader is aware of, in this case the Parquet reader.
This list is influenced by the schema parameter.

global_column_ids comes from the TableFunctionInitInput, and will also contain artificial/generated columns like "filename"

@Tishj Tishj requested a review from samansmink February 7, 2025 11:32
@Mytherin Mytherin changed the base branch from main to v1.2-histrionicus February 7, 2025 11:38
@Mytherin
Copy link
Collaborator

Mytherin commented Feb 7, 2025

Thanks! Can you rebase this to v1.2-histrionicus so we can push it out for v1.2.1?

@Mytherin Mytherin changed the base branch from v1.2-histrionicus to main February 7, 2025 11:39
@marcoslot
Copy link
Contributor

Thanks, we found the same fix, though still get the error when filtering a Delta table by filename:

git clone https://github.com/delta-io/delta-examples.git
select * from delta_scan('delta-examples/data/people_countries_delta_dask/', filename = True) where filename = 'delta-examples/data/people_countries_delta_dask/country=Argentina/part-00000-8d0390a3-f797-4265-b9c2-da1c941680a3.c000.snappy.parquet';

though that might be a separate issue

@samansmink
Copy link
Contributor

@marcoslot I can confirm that that issue still persists after this fix. I will take a look

@samansmink
Copy link
Contributor

@marcoslot's issue is also present in DuckDB v1.1.3 meaning that it is separate issue. I will open an issue in the duckdb_delta repo for it

Copy link
Contributor

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the branch to merge into

@Tishj Tishj changed the base branch from main to v1.2-histrionicus February 7, 2025 12:05
@Tishj Tishj force-pushed the multi_file_reader_create_filter_map_fix branch from 23062e2 to 89daa84 Compare February 7, 2025 12:08
@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 7, 2025 12:09
@Tishj Tishj marked this pull request as ready for review February 7, 2025 12:10
@Mytherin Mytherin merged commit f7637a9 into duckdb:v1.2-histrionicus Feb 7, 2025
50 checks passed
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Mar 7, 2025
[Dev] MultiFileReader fix InternalError in CreateFilterMap (duckdb/duckdb#16114)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Internal error in 1.2.0 when combining schema, filename and filter in read_parquet
4 participants