-
Notifications
You must be signed in to change notification settings - Fork 2.6k
CSV - Always run sniffer by default #9250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+268
−1,155
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@Mytherin is this good to go? |
Yes, thanks! LGTM |
cc @szarnyasg I'll update the master branch of the docs in a bit! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In our CSV API, we have four different ways of interacting with the CSV scanner through the DuckDB Core.
read_csv
read_csv_auto
From these, 1. and 3. had the sniffer (e.g.,
auto_detect
) set toFalse
as default. While 2. and 4. had it toTrue
.This PR changes the default of all CSV interactions with the sniffer to true. Hence, it includes adjustments to ensure that the sniffer does not discard options set by the user. It also adds code to make the sniffer support the
ignore_errors
option.Notes regarding adjustments on the tests:
test/sql/copy/csv/code_cov/csv_type_detection.test
We can detect this file is now empty and just output nothing.
test/sql/copy/csv/null_padding_big.test
Because
ignore_errors
is set andnull_padding
is not, we ignore the line that requires null_padding as error.test/sql/copy/csv/parallel/test_parallel_error_messages.test
We can't autodetect if we want to present this error.
test/sql/copy/csv/test_blob.test
We don't support BLOB types in the sniffer. Hence, these must be marked with
auto_detect=false
test/sql/copy/csv/test_copy.test
Because the CSV file is empty, ignore errors and null_padding are off, and the table definition has 3 columns, the sniffer will complain about the file as not having 3 columns.
test/sql/copy/csv/test_read_csv.test
The sniffer detects the dateformat.