Skip to content

read_csv fails to auto detect dates in some cases #13923

@crholm

Description

@crholm

What happens?

v1.1.0 fa5c2fe

Autodetect and parsing date does not honor dateformat in read_csv in some cases. Surrounding columns seems to play a role in the auto detection.

Tested using cli and also golang lib with same results as far as i can tell.

To Reproduce

Failing example

ACCESSION_NUMBER,FILING_DATE
1-1-1A,31-OCT-2023
1-1-1B,31-OCT-2023
SELECT * FROM read_csv('date-failing.csv',  header=true,dateformat='%d-%b-%Y');
┌──────────────────┬─────────────┐
│ ACCESSION_NUMBER │ FILING_DATE │
│     varchar      │   varchar   │
├──────────────────┼─────────────┤
│ 1-1-1A           │ 31-OCT-2023 │
│ 1-1-1B           │ 31-OCT-2023 │
└──────────────────┴─────────────┘

Working example

ACCESSION_NUMBER,FILING_DATE
1-1-A,31-OCT-2023
1-1-B,31-OCT-2023
SELECT *  FROM read_csv('date-succeed.csv',header=true,dateformat='%d-%b-%Y'  );
┌──────────────────┬─────────────┐
│ ACCESSION_NUMBER │ FILING_DATE │
│     varchar      │    date     │
├──────────────────┼─────────────┤
│ 1-1-A            │ 2023-10-31  │
│ 1-1-B            │ 2023-10-31  │
└──────────────────┴─────────────┘

OS:

Linux

DuckDB Version:

v1.1.0 fa5c2fe

DuckDB Client:

./duckdb

Hardware:

Intel PC

Full Name:

Rasmus Holm

Affiliation:

Modular Finance

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions