CSV reader will delete valid records when erroneous newlines are present

### What happens?

When using the CSV reader in version `1.1.1`, valid rows from the CSV will be deleted when there are erroneous newlines present in the CSV surrounding the valid records. This occurs when you use the `ignore_errors` parameter. If you set `ignore_errors` to `False` or use `store_rejects`, then no data loss occurs.

I believe this is correlated to CSVs with a large amount of records. During my testing, as I've reduced the amount of records in my test CSV the amount of dataloss decreased as the amount of records decreased. I'll provide my sample data below.

This issue first began happening in version `1.1.0`, and this is not an issue in version `1.0.0`.

### To Reproduce

First, download the sample data here: [test_duckdb_data.csv.zip](https://github.com/user-attachments/files/17192226/test_duckdb_data.csv.zip)

Using this CSV, we'll create two views in DuckDB:

```
CREATE VIEW test_1 AS
SELECT *
FROM read_csv_auto('test_duckdb_data.csv',
    header=True,
    ignore_errors=True,
    delim=',',
    all_varchar=True
    );
```
`test_1` is ignoring errors.

```
CREATE VIEW test_2 AS
SELECT *
FROM read_csv_auto('test_duckdb_data.csv',
    header=True,
    ignore_errors=True,
    delim=',',
    all_varchar=True,
    store_rejects=True
    );
```

`test_2` is storing rejects (or you can set `ignore_errors=False`).

Once these two views have been loaded from the same data, you can test the amount of records for both. You should see a discrepancy:

```
WITH view_counts AS (
    SELECT
        (SELECT COUNT(*) FROM test_1) AS test_1_count,
        (SELECT COUNT(*) FROM test_2) AS test_2_count
    )
SELECT
    test_1_count,
    test_2_count,
    ABS((test_2_count - test_1_count)) AS difference
FROM
    view_counts;
```

And you can looking at the records that exist in `test_2` but not in `test_1`:

```
SELECT *
FROM test_2
EXCEPT
SELECT *
FROM test_1;
```


### OS:

macOS 14.6.1, Apple M1 (AArch64)

### DuckDB Version:

1.1.1

### DuckDB Client:

Python

### Hardware:

_No response_

### Full Name:

Phillip Wright

### Affiliation:

Shipt

### What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

### Did you include all relevant data sets for reproducing the issue?

Yes

### Did you include all code required to reproduce the issue?

- [X] Yes, I have

### Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

- [X] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CSV reader will delete valid records when erroneous newlines are present #14177

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CSV reader will delete valid records when erroneous newlines are present #14177

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions