Merge v1.3 into main #17806

Mytherin · 2025-06-05T07:41:38Z

@evertlammerts @carlopi I had to reconcile a merge conflict between #17708 and #17605 - could you double check that the code in setup.py is still correct?

…nction Move version parsing and bumping logic to top of file and consolidate version handling through a single bump_version function. Replace complex setuptools_scm parsing and version_scheme with streamlined implementation using OVERRIDE_GIT_DESCRIBE environment variable handling.

Needed for duckdb-r 1.3.0.

…#17689) `FileExists` returns true for root buckets on S3 (e.g. `s3://root-bucket/`). This causes partitioned copy operations like the following to fail currently: ```sql copy (select 42 i, 1 p) to 's3://root-bucket/' (format parquet, partition_by p); -- Cannot write to "s3://root-bucket/" - it exists and is a file, not a directory! ``` The check ("is this a file or is this a directory") doesn't really make sense on blob stores to begin with - so just skip it for remote files.

duckdblabs/duckdb-internal#4942

Fixes: 1. duckdb#17682 (missed a `!`, used uninitialized variable in Parquet BSS encoder) 2. duckdblabs/duckdb-internal#4999 (`ExternalFileCache` assertion failure because we exited loop too early)

`GetFileHandle()` bypasses a check to `validate` which tells the caching_file_system to prefer file data in cache. By calling `CanSeek()` first there is a check to the cache if the file is in cache and if seeking is possible. This avoids an unnecessary head request for full file reads (like avro on Iceberg).

Currently we assume all plans can be cached - this allows table functions to opt out of statement caching. If this is opted out of we will always rebind when re-executing a prepared statement instead.

…`query` statement (duckdb#17710) This PR is part of fixing duckdblabs/duckdb-internal#5006 This is required for `duckdb-iceberg`, as it uses `<FILE>:` for its TPCH tests, which needs a `__WORKING_DIRECTORY__` to function when called from duckdb/duckdb (duckdb/duckdb-iceberg#270)

… is FIXED_LEN_BYTE_ARRAY

* Only generate IN pushdown filters for equality conditions fixes: duckdblabs/duckdb-internal#5022

* Add missing commutative variants for DOUBLE and BIGINT * Fix broken test that no one seemed to have noticed... fixes: duckdblabs/duckdb-internal#4995

… is FIXED_LEN_BYTE_ARRAY (duckdb#17723) Fixes a regression introduced in duckdb#16161 Type length may also be set for variable-length byte arrays (in which case it should be ignored).

* Only generate IN pushdown filters for equality conditions fixes: duckdblabs/duckdb-internal#5022

* Add missing commutative variants for DOUBLE and BIGINT * Fix broken test that no one seemed to have noticed... fixes: duckdblabs/duckdb-internal#4995

…es during sniffing

…duckdb#17581) Fixes: duckdb#17008 This PR fixes an issue with type casting in lambda expressions used in the `list_reduce` function. For example, in the following query: ```sql select list_reduce([0], (x, y) -> x > 3, 3.1) ``` The lambda expression was incorrectly bound as: ```sql CAST((x > CAST(3 AS INTEGER)) AS DECIMAL(11,1)) ``` Now proper type casting is implemented to match the max logical type of both the list child type and the initial value type: ```sql CAST((x > CAST(3 AS DECIMAL(11,1))) AS DECIMAL(11,1)) ``` ### Test Cases Added tests to verify the correct casting behaviour

Current status of cache shows stuff is not really long lived, with a bunch of space (I have not measured, possibly 20%+) busy with repeated msys2 items, that are only cached on the same PR. Moved to regular behaviour that is caching only on main branch.

Add infra so that a subset of extensions can be tested on each PR, that should speed up CI times with limited risks. Currently skips `encodings` and `spatial` on PRs. Which extensions is up for discussions, I would be for expanding even more, given most extensions are not actually impacting CI surface. I would like to see whether this actually works (minor implementations details migth be off), then we can discuss. This could also be paired with other ideas to improve PR process, like having a tag "extra tests" that opt-in to the full set of tests, but this is worth having also in isolation I think.

…ull_pad

…ferType::BLOCK` (duckdb#17771) Fixes duckdblabs/duckdb-internal#4998

…_ptr (duckdb#17749) This also simplifies the destruction code, since these strings will be cleaned up with the `ArgMinMaxStateBase`

Very basic initial implementation. Adds a new log type (`PhysicalOperator`) and adds logging for the hash join and parquet writer. I've implemented a utility that can be passed into classes that we use during execution such as `JoinHashTable` and `ParquetWriter` that logs messages, but we can see which operator this belongs to: ```sql D pragma enable_logging; D set logging_level='DEBUG'; D set debug_force_external=true; D set threads=1; D copy ( select t1.i from range(3_000_000) t1(i) join range(3_000_000) t2(i) using (i) ) to 'physical_operator_logging.parquet'; D pragma disable_logging; ┌──────────────────┬───────────────┬──────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ type │ operator_type │ parameters │ message │ │ varchar │ varchar │ map(varchar, varchar) │ varchar │ ├──────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187250 rows, 7377554 bytes) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ External hash join: enabled. Size (118108864 bytes) greater than reservation (15782448 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122896 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (64354 rows, 532480 bytes) to file "physical_operator_logging.parquet" (Combine) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187018 rows, 7373610 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (188044 rows, 7391052 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123530 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187019 rows, 7373627 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124556 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187663 rows, 7384575 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123531 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187163 rows, 7376075 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124175 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (188208 rows, 7393840 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123675 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187522 rows, 7382178 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124720 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187690 rows, 7385034 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124034 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187526 rows, 7382246 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124202 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187171 rows, 7376211 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124038 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187427 rows, 7380563 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123683 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187701 rows, 7385221 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123939 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187690 rows, 7385034 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124213 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187374 rows, 7379662 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (124202 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ HASH_JOIN │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187534 rows, 7382382 bytes) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (123886 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │ │ PhysicalOperator │ COPY_TO_FILE │ {} │ Flushing row group (93326 rows, 765440 bytes) to file "physical_operator_logging.parquet" (Combine) │ ├──────────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 42 rows 4 columns │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` I'd be happy to receive any feedback on this :)

…kdb#17793) Fixes duckdblabs/duckdb-internal#4990

…midcomment line during sniffing (duckdb#17751) This PR, considers the null_padding option when detecting comments in a CSV File. It also quotes values that have a possible comment option (i.e., '#'). Fix: duckdb#17744

…l look-ups" This reverts commit 3c2f966.

…l look-ups" (duckdb#17805) This reverts commit 3c2f966.

tools/pythonpkg/setup.py

Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com>

…v13again

@evertlammerts

More merging in `main`, with the twist that I did not see the proper merge conflict raised at #17806 (comment). (@evertlammerts) This also includes #17831

Merge v1.3 into main (duckdb/duckdb#17806)

Merge v1.3 into main (duckdb/duckdb#17806) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>

krlmlr and others added 30 commits May 25, 2025 16:14

chore: Fix strict aliasing warning on GCC

8a20236

Format

b338348

Use Load

1982931

Partitioned copy: don't check if file exists for remote files

d093198

chore: Fix strict aliasing warning on GCC (duckdb#17641)

1bbbba9

Needed for duckdb-r 1.3.0.

fix duckdb#17682

41038b0

fix duckdblabs/duckdb-internal#4999

76f19bc

Allow table functions to disable statement caching

ced6942

do not get file handle unnecessarily

b9891df

Fix version detection for sdist builds without git info (duckdb#17605)

96927ed

duckdblabs/duckdb-internal#4942

Bugfixes (duckdb#17695)

4e0a953

Fixes: 1. duckdb#17682 (missed a `!`, used uninitialized variable in Parquet BSS encoder) 2. duckdblabs/duckdb-internal#4999 (`ExternalFileCache` assertion failure because we exited loop too early)

Allow table functions to disable statement caching (duckdb#17702)

51070f2

Currently we assume all plans can be cached - this allows table functions to opt out of statement caching. If this is opted out of we will always rebind when re-executing a prepared statement instead.

replace keywords on the filename in a <FILE> result

b843e8f

whoops

0532b3c

Pop up ICU errors to the csv sniffer

a291b2d

Parquet Reader: only read strings as fixed length strings if the type…

dcb301c

… is FIXED_LEN_BYTE_ARRAY

Also for decimals

5f46929

Internal duckdb#5022: IN Pushdown Equalities

bf7a090

* Only generate IN pushdown filters for equality conditions fixes: duckdblabs/duckdb-internal#5022

Internal duckdb#4995: Commutative INTERVAL Multiply

b771e82

* Add missing commutative variants for DOUBLE and BIGINT * Fix broken test that no one seemed to have noticed... fixes: duckdblabs/duckdb-internal#4995

Parquet Reader: only read strings as fixed length strings if the type…

bca5bdd

… is FIXED_LEN_BYTE_ARRAY (duckdb#17723) Fixes a regression introduced in duckdb#16161 Type length may also be set for variable-length byte arrays (in which case it should be ignored).

Internal duckdb#5022: IN Pushdown Equalities (duckdb#17731)

1fdb10c

* Only generate IN pushdown filters for equality conditions fixes: duckdblabs/duckdb-internal#5022

Internal duckdb#4995: Commutative INTERVAL Multiply (duckdb#17730)

0de6960

* Add missing commutative variants for DOUBLE and BIGINT * Fix broken test that no one seemed to have noticed... fixes: duckdblabs/duckdb-internal#4995

fix internal issue 5021

6334772

Make arg string a unique_ptr

022890c

PR requests

c6c7de0

Consider if null_padding is set to true when detectign midcomment lin…

e5f3c67

…es during sniffing

Mytherin and others added 15 commits June 4, 2025 09:43

make PhysicalOperator logs slightly more structured

b85965e

fix duckdblabs/duckdb-internal#4990

01a8c70

Merge remote-tracking branch 'upstream/v1.3-ossivalis' into comment_n…

b66752c

…ull_pad

Add FileBufferType::EXTERNAL_FILE and add to same queue as `FileBuf…

40baf59

…ferType::BLOCK` (duckdb#17771) Fixes duckdblabs/duckdb-internal#4998

Storge the argument and value of arg_min_max in the state as a unique…

4d90750

…_ptr (duckdb#17749) This also simplifies the destruction code, since these strings will be cleaned up with the `ArgMinMaxStateBase`

Take string size into account in GetRowSize in ParquetWriter (duc…

d81605e

…kdb#17793) Fixes duckdblabs/duckdb-internal#4990

Avoid early-out when catalog lookup fails - instead finish all look-ups

3c2f966

Revert "Avoid early-out when catalog lookup fails - instead finish al…

fc57028

…l look-ups" This reverts commit 3c2f966.

Revert "Avoid early-out when catalog lookup fails - instead finish al…

163b0d2

…l look-ups" (duckdb#17805) This reverts commit 3c2f966.

Merge branch 'v1.3-ossivalis' into mergev13again

af30ece

carlopi reviewed Jun 5, 2025

View reviewed changes

tools/pythonpkg/setup.py Outdated Show resolved Hide resolved

Update tools/pythonpkg/setup.py

926bf99

Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com>

Mytherin marked this pull request as draft June 5, 2025 08:34

Mytherin marked this pull request as ready for review June 5, 2025 08:34

duckdb-draftbot marked this pull request as draft June 5, 2025 08:37

Mytherin added 2 commits June 5, 2025 10:48

Format fix

17da215

Merge branch 'mergev13again' of github.com:Mytherin/duckdb into merge…

67f5303

…v13again

Mytherin marked this pull request as ready for review June 5, 2025 08:49

Mytherin merged commit f85436e into duckdb:main Jun 5, 2025
53 of 54 checks passed

carlopi mentioned this pull request Jun 6, 2025

Merge130 #17833

Merged

Mytherin added a commit that referenced this pull request Jun 6, 2025

Merge130 (#17833)

4d7cb70

More merging in `main`, with the twist that I did not see the proper merge conflict raised at #17806 (comment). (@evertlammerts) This also includes #17831

Mytherin deleted the mergev13again branch June 12, 2025 15:27

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Jun 21, 2025

vendor: Update vendored sources to duckdb/duckdb@f85436e

7ddb043

Merge v1.3 into main (duckdb/duckdb#17806)

github-actions bot mentioned this pull request Jun 21, 2025

vendor: Update vendored sources to duckdb/duckdb@f85436e220b8257279f4598b1089ba92c053ce0f duckdb/duckdb-r#1262

Merged

github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Jun 21, 2025

vendor: Update vendored sources to duckdb/duckdb@f85436e (#1262)

0eed78c

Merge v1.3 into main (duckdb/duckdb#17806) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge v1.3 into main #17806

Merge v1.3 into main #17806

Uh oh!

Mytherin commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge v1.3 into main #17806

Merge v1.3 into main #17806

Uh oh!

Conversation

Mytherin commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mytherin commented Jun 5, 2025 •

edited

Loading