-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
What happens?
When using the read_csv_auto()
function in 1.1.0, the ignore_errors
parameter has strange behavior that didn't exist in 1.0.0. If a row within the CSV produces the TOO MANY COLUMNS
error type, none of the other rows within the CSV are returned in the select query.
To Reproduce
Here is sample data I used in order to reproduce.
CREATE VIEW test AS SELECT * FROM read_csv_auto(
'mock_duckdb_test_data.csv',
header=True,
ignore_errors=True
);
Create a view with the sample data, and then query the results:
SELECT *
FROM test AS t;
- In 1.1.0, the line with the
extra_record
will have been added to the headers, and no other data is returned. - In 1.0.0, as expected, the lines with errors are skipped and the good data is returned.
Additionally, if you add store_rejects
when creating the view:
CREATE VIEW test AS SELECT * FROM read_csv_auto(
'mock_duckdb_test_data.csv',
header=True,
ignore_errors=True,
store_rejects=True
);
An internal error is returned:
INTERNAL Error: Attempted to dereference unique_ptr that is NULL! This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic. For more information, see https:// duckdb. org/ docs/ dev/ internal_errors
OS:
macOS 14.6.1, Apple M1 (AArch64)
DuckDB Version:
1.1.0
DuckDB Client:
Python
Hardware:
No response
Full Name:
Phillip Wright
Affiliation:
Shipt
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have