Skip to content

Internal error in 1.2.0 when combining schema, filename and filter in read_parquet #16094

@marcoslot

Description

@marcoslot

What happens?

When running a query with read_parquet that has a combination of a filter, schema, and filename option, DuckDB 1.2.0 crashes with an assertion failure.

To Reproduce

copy (select x from generate_series(1,100) as g(x)) to '/tmp/x.parquet' with (field_ids {x: 1});
select x, filename from read_parquet('/tmp/x.parquet', schema=map {1: {name: 'x', type: 'int', default_value: NULL}}, filename=True) where x = 1;

Gives

INTERNAL Error:
Attempted to access index 1 within vector of size 1
...

This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors

Subsequent commands then fail with:

FATAL Error:
Failed: database has been invalidated because of a previous fatal error. The database must be restarted prior to being used again.
Original error: "Attempted to access index 1 within vector of size 1"

OS:

Ubuntu 22.04

DuckDB Version:

1.2.0

DuckDB Client:

CLI

Hardware:

No response

Full Name:

Marco Slot

Affiliation:

Crunchy Data

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions