Skip to content

Conversation

pdet
Copy link
Contributor

@pdet pdet commented May 13, 2025

  • Detect SQLNULL types for schema merging
    Previously, we would always upcast a SQL NULL type to VARCHAR. We now keep it as a SQL NULL until merging the schemas of multiple files, so we can upcast to a type more refined than VARCHAR. This only happens if we sniffed the entire file. Otherwise, there's no guarantee that a given column is always NULL.

  • Use schema merging in csv relations
    When using CSV relations, the auto_detect code didn’t use the schema merging algorithm for multiple files, which could cause inconsistent behavior between queries and relations. This PR changes it so both now use the same code.

  • files_to_sniff option
    When sniffing and merging multiple schemas, we previously had a hard-coded limit of 10 files. This is now configurable: users can set the limit themselves. Setting it to -1 will detect and merge the schemas of all files used by the scanner. By default, the value remains 10.

Fix: #17452
Fix: #17451

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 16, 2025 12:40
@pdet pdet marked this pull request as ready for review May 16, 2025 12:54
@Mytherin Mytherin changed the base branch from main to v1.3-ossivalis May 19, 2025 06:26
@Mytherin Mytherin merged commit 1f0067f into duckdb:v1.3-ossivalis May 19, 2025
50 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 21, 2025
[CSV Reader] Detect SQLNULL types for schema merging, use schema merging in csv relations, add files_to_sniff option.  (duckdb/duckdb#17467)
[Python Dev] Fix failing tests for the Python SQLLogicTester (duckdb/duckdb#17529)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 21, 2025
[CSV Reader] Detect SQLNULL types for schema merging, use schema merging in csv relations, add files_to_sniff option.  (duckdb/duckdb#17467)
[Python Dev] Fix failing tests for the Python SQLLogicTester (duckdb/duckdb#17529)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 21, 2025
[CSV Reader] Detect SQLNULL types for schema merging, use schema merging in csv relations, add files_to_sniff option.  (duckdb/duckdb#17467)
[Python Dev] Fix failing tests for the Python SQLLogicTester (duckdb/duckdb#17529)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 23, 2025
[CSV Reader] Detect SQLNULL types for schema merging, use schema merging in csv relations, add files_to_sniff option.  (duckdb/duckdb#17467)
[Python Dev] Fix failing tests for the Python SQLLogicTester (duckdb/duckdb#17529)
@pdet pdet deleted the csv_glob_types branch July 18, 2025 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema mismatch on 11+ files File with a single empty value confuses sniffer
2 participants