Skip to content

ignore_errors parameter is causing data loss during TOO MANY COLUMNS error type #14001

@phillipwright7

Description

@phillipwright7

What happens?

When using the read_csv_auto() function in 1.1.0, the ignore_errors parameter has strange behavior that didn't exist in 1.0.0. If a row within the CSV produces the TOO MANY COLUMNS error type, none of the other rows within the CSV are returned in the select query.

To Reproduce

Here is sample data I used in order to reproduce.

CREATE VIEW test AS SELECT * FROM read_csv_auto(
        'mock_duckdb_test_data.csv',
        header=True,
        ignore_errors=True
        );

Create a view with the sample data, and then query the results:

SELECT *
FROM test AS t;
  • In 1.1.0, the line with the extra_record will have been added to the headers, and no other data is returned.
  • In 1.0.0, as expected, the lines with errors are skipped and the good data is returned.

Additionally, if you add store_rejects when creating the view:

CREATE VIEW test AS SELECT * FROM read_csv_auto(
        'mock_duckdb_test_data.csv',
        header=True,
        ignore_errors=True,
        store_rejects=True
        );

An internal error is returned:

INTERNAL Error: Attempted to dereference unique_ptr that is NULL! This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic. For more information, see https:// duckdb. org/ docs/ dev/ internal_errors

OS:

macOS 14.6.1, Apple M1 (AArch64)

DuckDB Version:

1.1.0

DuckDB Client:

Python

Hardware:

No response

Full Name:

Phillip Wright

Affiliation:

Shipt

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions