-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
Description
What happens?
When reading multiple csv
or csv.gz
files say with a glob pattern, and one of those has no rows, all the columns in that file is casted as a VARCHAR. This causes a schema mismatch between that file and all other files when they have a defined type.
To Reproduce
Run this in the DuckDB CLI (or run in an in-memory database in Python)
CREATE OR REPLACE TABLE
cte AS
SELECT
generate_series::DATE as date_col,
(random()*100)::INTEGER AS int_col,
(random()*123.45)::DOUBLE AS double_col
FROM
GENERATE_SERIES(TIMESTAMP '2000-01-01 00:00:00', TIMESTAMP '2000-12-31 00:00:00', INTERVAL 1 DAY);
COPY (
FROM
cte
WHERE
EXTRACT(YEAR FROM date_col) == 2000
) TO '__2000.csv.gz';
COPY (
FROM
cte
WHERE
EXTRACT(YEAR FROM date_col) == 2001
) TO '__2001.csv.gz';
FROM
'__200*.csv.gz';
Error:
Invalid Input Error: Schema mismatch between globbed files.
Main file schema: __2000.csv.gz
Current file: __2001.csv.gz
Column with name: "date_col" is expected to have type: DATE But has type: VARCHAR
Column with name: "int_col" is expected to have type: BIGINT But has type: VARCHAR
Column with name: "double_col" is expected to have type: DOUBLE But has type: VARCHAR
Potential Fix: Since your schema has a mismatch, consider setting union_by_name=true.
Note that this also occurs when the format for the output file to be read again is csv
, but does not occur when the output format is a parquet file.
OS:
macOS aarch64 / RHEL 7.6 Linux x86_64
DuckDB Version:
1.1.1
DuckDB Client:
CLI
Hardware:
No response
Full Name:
Kian Francis Gedaria Laredo
Affiliation:
Phitopolis International Corporation
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have