Skip to content

DuckDB's sniff_csv doesn't correctly detect RFC 4180-compliant CSVs with a single column #11403

@averms

Description

@averms

What happens?

When reading in a fairly simple, rfc 4180-compliant csv, duckdb's sniff_csv returns incorrect quote and escape results. This leads to the csv being parsed incorrectly.

I think rfc 4180-compliant csvs should be parsed correctly by default, without requiring any special arguments.

To Reproduce

The csv file is just

col1
"cell with
newline"

Using the cli i get:

FROM sniff_csv('duckdb-mwe1.csv');
┌───────────┬─────────┬─────────┬───┬────────────┬─────────────────┬───────────────┬──────────────────────┐
│ Delimiter │  Quote  │ Escape  │ … │ DateFormat │ TimestampFormat │ UserArguments │        Prompt        │
│  varchar  │ varchar │ varchar │   │  varchar   │     varchar     │    varchar    │       varchar        │
├───────────┼─────────┼─────────┼───┼────────────┼─────────────────┼───────────────┼──────────────────────┤
│ ,         │ '       │ \       │ … │            │                 │               │ FROM read_csv('duc…  │
├───────────┴─────────┴─────────┴───┴────────────┴─────────────────┴───────────────┴──────────────────────┤
│ 1 rows                                                                             11 columns (7 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
SELECT * FROM 'duckdb-mwe1.csv';
┌────────────┐
│    col1    │
│  varchar   │
├────────────┤
│ "cell with │
│ newline"   │
└────────────┘

OS:

linux x86_64

DuckDB Version:

v0.10.1 4a89d97

DuckDB Client:

cli

Full Name:

cisUGO2htUgR+0mm

Affiliation:

XeH3FD0xBLDC1XS1

Have you tried this on the latest nightly build?

I have tested with a nightly build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions