csv/csv.gz Files With Empty Rows Auto-Inferred As VARCHAR

### What happens?

When reading multiple `csv` or `csv.gz` files say with a glob pattern, and one of those has no rows, all the columns in that file is casted as a VARCHAR. This causes a schema mismatch between that file and all other files when they have a defined type.

### To Reproduce

Run this in the DuckDB CLI (or run in an in-memory database in Python)

```sql
CREATE OR REPLACE TABLE
    cte AS
SELECT
    generate_series::DATE as date_col,
    (random()*100)::INTEGER AS int_col,
    (random()*123.45)::DOUBLE AS double_col
FROM
    GENERATE_SERIES(TIMESTAMP '2000-01-01 00:00:00', TIMESTAMP '2000-12-31 00:00:00', INTERVAL 1 DAY);

COPY (
    FROM
        cte
    WHERE
        EXTRACT(YEAR FROM date_col) == 2000
) TO '__2000.csv.gz';

COPY (
    FROM
        cte
    WHERE
        EXTRACT(YEAR FROM date_col) == 2001
) TO '__2001.csv.gz';

FROM
    '__200*.csv.gz';
```

Error:
```
Invalid Input Error: Schema mismatch between globbed files.
Main file schema: __2000.csv.gz
Current file: __2001.csv.gz
Column with name: "date_col" is expected to have type: DATE But has type: VARCHAR
Column with name: "int_col" is expected to have type: BIGINT But has type: VARCHAR
Column with name: "double_col" is expected to have type: DOUBLE But has type: VARCHAR
Potential Fix: Since your schema has a mismatch, consider setting union_by_name=true.
```

Note that this also occurs when the format for the output file to be read again is `csv`, but does not occur when the output format is a parquet file.

### OS:

macOS aarch64 / RHEL 7.6 Linux x86_64

### DuckDB Version:

1.1.1

### DuckDB Client:

CLI

### Hardware:

_No response_

### Full Name:

Kian Francis Gedaria Laredo

### Affiliation:

Phitopolis International Corporation

### What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

### Did you include all relevant data sets for reproducing the issue?

Yes

### Did you include all code required to reproduce the issue?

- [X] Yes, I have

### Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

- [X] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

csv/csv.gz Files With Empty Rows Auto-Inferred As VARCHAR #14166

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

csv/csv.gz Files With Empty Rows Auto-Inferred As VARCHAR #14166

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions