-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Labels
Description
What happens?
When reading in a fairly simple, rfc 4180-compliant csv, duckdb's sniff_csv returns incorrect quote and escape results. This leads to the csv being parsed incorrectly.
I think rfc 4180-compliant csvs should be parsed correctly by default, without requiring any special arguments.
To Reproduce
The csv file is just
col1
"cell with
newline"
Using the cli i get:
FROM sniff_csv('duckdb-mwe1.csv');
┌───────────┬─────────┬─────────┬───┬────────────┬─────────────────┬───────────────┬──────────────────────┐
│ Delimiter │ Quote │ Escape │ … │ DateFormat │ TimestampFormat │ UserArguments │ Prompt │
│ varchar │ varchar │ varchar │ │ varchar │ varchar │ varchar │ varchar │
├───────────┼─────────┼─────────┼───┼────────────┼─────────────────┼───────────────┼──────────────────────┤
│ , │ ' │ \ │ … │ │ │ │ FROM read_csv('duc… │
├───────────┴─────────┴─────────┴───┴────────────┴─────────────────┴───────────────┴──────────────────────┤
│ 1 rows 11 columns (7 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
SELECT * FROM 'duckdb-mwe1.csv';
┌────────────┐
│ col1 │
│ varchar │
├────────────┤
│ "cell with │
│ newline" │
└────────────┘
OS:
linux x86_64
DuckDB Version:
v0.10.1 4a89d97
DuckDB Client:
cli
Full Name:
cisUGO2htUgR+0mm
Affiliation:
XeH3FD0xBLDC1XS1
Have you tried this on the latest nightly build?
I have tested with a nightly build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have