Skip to content

Conversation

pdet
Copy link
Contributor

@pdet pdet commented Sep 30, 2024

When globbing multiple CSV files, if a file that would be ran with the adaptive sniffer only has one row, the adaptive sniffer could misidentify if the one row would be a header or a data row.

This PR extends the checks to identify this case, and adds tests for it.

Fix: #14166

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 1, 2024 10:23
@pdet pdet marked this pull request as ready for review October 1, 2024 11:29
@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 2, 2024 12:26
@pdet pdet marked this pull request as ready for review October 2, 2024 12:26
@Mytherin
Copy link
Collaborator

Mytherin commented Oct 4, 2024

Thanks for the PR!

I wonder if we should be sniffing over multiple files when the files are small, similar to what we do in the JSON reader? That would perhaps more robustly fix the issue.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 10, 2024 10:20
@pdet pdet marked this pull request as ready for review October 10, 2024 10:34
@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 10, 2024 15:23
@pdet pdet marked this pull request as ready for review October 10, 2024 15:23
@Mytherin
Copy link
Collaborator

Thanks - one more comment, otherwise LGTM

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 11, 2024 14:57
@pdet pdet marked this pull request as ready for review October 11, 2024 14:57
@Mytherin Mytherin merged commit 217ec47 into duckdb:main Oct 15, 2024
43 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 19, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 19, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 19, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Oct 19, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Oct 20, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 30, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 30, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 31, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)
github-merge-queue bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 31, 2024
[Adaptive Sniffer] In case files have only one row, be more permissive to detect headers and types. (duckdb/duckdb#14174)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
@pdet pdet deleted the better_glob branch November 27, 2024 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

csv/csv.gz Files With Empty Rows Auto-Inferred As VARCHAR
2 participants