-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Merge v1.3 into main #17806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Merge v1.3 into main #17806
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nction Move version parsing and bumping logic to top of file and consolidate version handling through a single bump_version function. Replace complex setuptools_scm parsing and version_scheme with streamlined implementation using OVERRIDE_GIT_DESCRIBE environment variable handling.
Needed for duckdb-r 1.3.0.
…#17689) `FileExists` returns true for root buckets on S3 (e.g. `s3://root-bucket/`). This causes partitioned copy operations like the following to fail currently: ```sql copy (select 42 i, 1 p) to 's3://root-bucket/' (format parquet, partition_by p); -- Cannot write to "s3://root-bucket/" - it exists and is a file, not a directory! ``` The check ("is this a file or is this a directory") doesn't really make sense on blob stores to begin with - so just skip it for remote files.
Fixes: 1. duckdb#17682 (missed a `!`, used uninitialized variable in Parquet BSS encoder) 2. duckdblabs/duckdb-internal#4999 (`ExternalFileCache` assertion failure because we exited loop too early)
`GetFileHandle()` bypasses a check to `validate` which tells the caching_file_system to prefer file data in cache. By calling `CanSeek()` first there is a check to the cache if the file is in cache and if seeking is possible. This avoids an unnecessary head request for full file reads (like avro on Iceberg).
Currently we assume all plans can be cached - this allows table functions to opt out of statement caching. If this is opted out of we will always rebind when re-executing a prepared statement instead.
…`query` statement (duckdb#17710) This PR is part of fixing duckdblabs/duckdb-internal#5006 This is required for `duckdb-iceberg`, as it uses `<FILE>:` for its TPCH tests, which needs a `__WORKING_DIRECTORY__` to function when called from duckdb/duckdb (duckdb/duckdb-iceberg#270)
… is FIXED_LEN_BYTE_ARRAY
* Only generate IN pushdown filters for equality conditions fixes: duckdblabs/duckdb-internal#5022
* Add missing commutative variants for DOUBLE and BIGINT * Fix broken test that no one seemed to have noticed... fixes: duckdblabs/duckdb-internal#4995
… is FIXED_LEN_BYTE_ARRAY (duckdb#17723) Fixes a regression introduced in duckdb#16161 Type length may also be set for variable-length byte arrays (in which case it should be ignored).
* Only generate IN pushdown filters for equality conditions fixes: duckdblabs/duckdb-internal#5022
* Add missing commutative variants for DOUBLE and BIGINT * Fix broken test that no one seemed to have noticed... fixes: duckdblabs/duckdb-internal#4995
…es during sniffing
…duckdb#17581) Fixes: duckdb#17008 This PR fixes an issue with type casting in lambda expressions used in the `list_reduce` function. For example, in the following query: ```sql select list_reduce([0], (x, y) -> x > 3, 3.1) ``` The lambda expression was incorrectly bound as: ```sql CAST((x > CAST(3 AS INTEGER)) AS DECIMAL(11,1)) ``` Now proper type casting is implemented to match the max logical type of both the list child type and the initial value type: ```sql CAST((x > CAST(3 AS DECIMAL(11,1))) AS DECIMAL(11,1)) ``` ### Test Cases Added tests to verify the correct casting behaviour
Current status of cache shows stuff is not really long lived, with a bunch of space (I have not measured, possibly 20%+) busy with repeated msys2 items, that are only cached on the same PR. Moved to regular behaviour that is caching only on main branch.
Add infra so that a subset of extensions can be tested on each PR, that should speed up CI times with limited risks. Currently skips `encodings` and `spatial` on PRs. Which extensions is up for discussions, I would be for expanding even more, given most extensions are not actually impacting CI surface. I would like to see whether this actually works (minor implementations details migth be off), then we can discuss. This could also be paired with other ideas to improve PR process, like having a tag "extra tests" that opt-in to the full set of tests, but this is worth having also in isolation I think.
…_ptr (duckdb#17749) This also simplifies the destruction code, since these strings will be cleaned up with the `ArgMinMaxStateBase`
Very basic initial implementation. Adds a new log type (`PhysicalOperator`) and adds logging for the hash join and parquet writer. I've implemented a utility that can be passed into classes that we use during execution such as `JoinHashTable` and `ParquetWriter` that logs messages, but we can see which operator this belongs to: ```sql D pragma enable_logging; D set logging_level='DEBUG'; D set debug_force_external=true; D set threads=1; D copy ( select t1.i from range(3_000_000) t1(i) join range(3_000_000) t2(i) using (i) ) to 'physical_operator_logging.parquet'; D pragma disable_logging; ┌──────────────────┬───────────────┬──────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ type │ operator_type │ parameters │ message │ │ varchar │ varchar │ map(varchar, varchar) │ varchar │ ├──────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187250 rows, 7377554 bytes) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ External hash join: enabled. Size (118108864 bytes) greater than reservation (15782448 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122896 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (64354 rows, 532480 bytes) to file "physical_operator_logging.parquet" (Combine) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187018 rows, 7373610 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (188044 rows, 7391052 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123530 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187019 rows, 7373627 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124556 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187663 rows, 7384575 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123531 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187163 rows, 7376075 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124175 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (188208 rows, 7393840 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123675 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187522 rows, 7382178 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124720 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187690 rows, 7385034 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124034 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187526 rows, 7382246 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124202 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187171 rows, 7376211 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124038 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187427 rows, 7380563 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123683 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187701 rows, 7385221 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123939 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187690 rows, 7385034 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124213 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187374 rows, 7379662 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124202 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187534 rows, 7382382 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123886 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (93326 rows, 765440 bytes) to file "physical_operator_logging.parquet" (Combine) │ ├──────────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 42 rows 4 columns │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` I'd be happy to receive any feedback on this :)
…midcomment line during sniffing (duckdb#17751) This PR, considers the null_padding option when detecting comments in a CSV File. It also quotes values that have a possible comment option (i.e., '#'). Fix: duckdb#17744
…l look-ups" This reverts commit 3c2f966.
…l look-ups" (duckdb#17805) This reverts commit 3c2f966.
carlopi
reviewed
Jun 5, 2025
Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com>
Merged
Mytherin
added a commit
that referenced
this pull request
Jun 6, 2025
More merging in `main`, with the twist that I did not see the proper merge conflict raised at #17806 (comment). (@evertlammerts) This also includes #17831
github-actions bot
pushed a commit
to duckdb/duckdb-r
that referenced
this pull request
Jun 21, 2025
Merge v1.3 into main (duckdb/duckdb#17806)
github-actions bot
added a commit
to duckdb/duckdb-r
that referenced
this pull request
Jun 21, 2025
Merge v1.3 into main (duckdb/duckdb#17806) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@evertlammerts @carlopi I had to reconcile a merge conflict between #17708 and #17605 - could you double check that the code in
setup.py
is still correct?