Skip to content

CSV sniffer detects file with single struct column incorrectly as of 1.1.2  #14396

@marcoslot

Description

@marcoslot

What happens?

A CSV containing a single struct in quoted, DuckDB-serialized form is no longer correctly recognized as of 1.1.2.

struct.csv:

struct_col
"{'a': 1, 'b': 2}"
from 'struct.csv';
┌─────────┬──────────┐
│   {a:   │ 'b': 2}" │
│ varchar │ varchar  │
├─────────┴──────────┤
│       0 rows       │
└────────────────────┘
(0 rows)

It's a little bit synthetic (came from a test suite), but it's surprising that it ignores the header and that the behaviour changed 1.1.1->1.1.2

To Reproduce

on v1.1.1:

copy (select { a: 1, b: 2 } struct_col) to 'struct.csv' with (header);
describe from read_csv('struct.csv');
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varcharvarcharvarcharvarcharvarcharvarchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ struct_col  │ VARCHAR     │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

on v1.1.2:

copy (select { a: 1, b: 2 } struct_col) to 'struct.csv' with (header);
describe from read_csv('struct.csv');
─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varcharvarcharvarcharvarcharvarcharvarchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ {a:         │ VARCHAR     │ YES     │         │         │         │
│ 'b': 2}"    │ VARCHAR     │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

Adding explicit header=True yields the same result, and so does adding more lines to the CSV.
Adding an additional column results in correct behaviour.

OS:

Ubuntu 22.04 x86_64

DuckDB Version:

1.1.2

DuckDB Client:

CLI

Hardware:

No response

Full Name:

Marco Slot

Affiliation:

Crunchy Data

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions