-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
Description
What happens?
read_csv with autodetect and sa,ple size -1 hangs on malformed csv when file is above some size.
The file is malformed but I would expect it to succeed or fail with errors.
The attached files have a field containing unescaped quote and separator.
e.g. "",""Unescaped quote and embedded, seperator"",
bad_csv_file_2045.csv
bad_csv_file_2046.csv
bad_csv_file_2047.csv
To Reproduce
Use files attached (added larger file than hangs on my machine in case it cannot be recreated with the 2047 example)
in duckdb cli the first query succeeds, the middle query fails to detect, last query hangs
from read_csv('bad_csv_file_2045.csv',auto_detect=TRUE, sample_size=-1);
from read_csv('bad_csv_file_2046.csv',auto_detect=TRUE, sample_size=-1);
from read_csv('bad_csv_file_2047.csv',auto_detect=TRUE, sample_size=-1);
OS:
MacOS
DuckDB Version:
1.2
DuckDB Client:
CLI
Hardware:
No response
Full Name:
Rob Hodgson
Affiliation:
SSC
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have