Skip to content

Conversation

pdet
Copy link
Contributor

@pdet pdet commented Oct 6, 2023

In our CSV API, we have four different ways of interacting with the CSV scanner through the DuckDB Core.

  1. read_csv
  2. read_csv_auto
  3. Copy
  4. Replacement Scans

From these, 1. and 3. had the sniffer (e.g., auto_detect) set to False as default. While 2. and 4. had it to True.
This PR changes the default of all CSV interactions with the sniffer to true. Hence, it includes adjustments to ensure that the sniffer does not discard options set by the user. It also adds code to make the sniffer support the ignore_errors option.

Notes regarding adjustments on the tests:

test/sql/copy/csv/code_cov/csv_type_detection.test
We can detect this file is now empty and just output nothing.

test/sql/copy/csv/null_padding_big.test
Because ignore_errors is set and null_padding is not, we ignore the line that requires null_padding as error.

test/sql/copy/csv/parallel/test_parallel_error_messages.test
We can't autodetect if we want to present this error.

test/sql/copy/csv/test_blob.test
We don't support BLOB types in the sniffer. Hence, these must be marked with auto_detect=false

test/sql/copy/csv/test_copy.test
Because the CSV file is empty, ignore errors and null_padding are off, and the table definition has 3 columns, the sniffer will complain about the file as not having 3 columns.

test/sql/copy/csv/test_read_csv.test
The sniffer detects the dateformat.

@github-actions github-actions bot marked this pull request as draft October 6, 2023 15:09
@pdet pdet marked this pull request as ready for review October 17, 2023 10:31
@pdet
Copy link
Contributor Author

pdet commented Oct 18, 2023

@Mytherin is this good to go?

@Mytherin Mytherin merged commit de73089 into duckdb:feature Oct 19, 2023
@Mytherin
Copy link
Collaborator

Yes, thanks! LGTM

@pdet
Copy link
Contributor Author

pdet commented Oct 20, 2023

cc @szarnyasg

I'll update the master branch of the docs in a bit!

@szarnyasg szarnyasg added Needs Documentation Use for issues or PRs that require changes in the documentation and removed Needs Documentation Use for issues or PRs that require changes in the documentation labels Oct 23, 2023
@pdet pdet deleted the csv_api branch June 27, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants