Skip to content

format issues with COPY FROM CSV #14205

@manticore-projects

Description

@manticore-projects

What happens?

COPY ... FROM ... does not use the same auto-detection as read_csv (on column separators).

are@ryzen ~> duckdb
v1.1.1 af39bd0dcf
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D CREATE TABLE sales (
      salesid          INTEGER         NOT NULL PRIMARY KEY
      , listid         INTEGER         NOT NULL
      , sellerid       INTEGER         NOT NULL
      , buyerid        INTEGER         NOT NULL
      , eventid        INTEGER         NOT NULL
      , dateid         SMALLINT        NOT NULL
      , qtysold        SMALLINT        NOT NULL
      , pricepaid      DECIMAL (8,2)
      , commission     DECIMAL (8,2)
      , saletime       TIMESTAMP
  )
  ;
D COPY sales FROM '~/tickitdb/sales_tab.txt' (FORMAT CSV, AUTO_DETECT true, timestampformat '%m/%d/%Y %I:%M:%S', ignore_errors true);
Invalid Input Error: Mismatch between the number of columns (1) in the CSV file and what is expected in the scanner (10).

Instead, one must set DELIMITER explicitely:

COPY sales FROM '~/tickitdb/sales_tab.txt' (FORMAT CSV, AUTO_DETECT true, DELIMITER '\t', timestampformat '%m/%d/%Y %I:%M:%S', ignore_errors true);

Interestingly, this is not the case for csv_read where the TAB delimiter would be detected correctly.

To Reproduce

see above.

OS:

Linux

DuckDB Version:

1.1.1

DuckDB Client:

Java

Hardware:

No response

Full Name:

Andreas Reichel

Affiliation:

manticore-projects.com

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions