-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add date_trunc()
simplification rules
#18457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @rcurtin - thanks for the PR! Before I dig in any further, can you fix the formatting ( |
Working my way through the CI failures, I should have it all working shortly. Looks like I just missed some includes where gcc doesn't have a problem but MSVC (rightly) does. It seems like the CI only runs if I select 'ready for review'---if I misunderstood that let me know. Can you give me an example of an incomplete hour offset? I believe everything should work in these cases (as well as the DST case) but I agree it would be good to add more test cases for those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty great - thanks for doing it. I've been a bit busy recently, but I've made some basic comments. I'll try to do a deeper dive this week.
@@ -2,8 +2,10 @@ | |||
# description: test DATE_TRUNC() constant simplifications | |||
# group: [optimizer] | |||
|
|||
require icu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing - we generally put the ICU tests in a separate file. This allows the feature to be tested in an environment where ICU is not available/loaded (e.g., WASM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Split into two files in bdc1a12. Let me know if I should name them differently.
@@ -12,7 +14,7 @@ statement ok | |||
PRAGMA explain_output = OPTIMIZED_ONLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to looking at explain output would be to use a loop that checks the results with and without the optimiser? This would make sure that all tests give the correct results instead of the one that are explicitly added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I am wrong here: I believe that the enable_verification
at the top of the test will test all the queries with and without the optimizer, so I think we are covered there. The explain analyze
test cases are specifically to ensure that the optimization is actually applying at all---it would be really easy for a bug in the heuristic to mean that it never applied the optimizations (and in that case, it always gives the same output as the unoptimized version, so the "failure" is not picked up). I'm not 100% sure I'm answering what you were getting at, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you are correct - good point!
return nullptr; | ||
} | ||
|
||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add IS [NOT] DISTINCT
cases as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I overlooked those. After working through the algebra I implemented them (they're quite similar but not exactly the same as the <>
and =
cases). I also added a bunch of tests to handle NULL
s on the rhs or lhs in 56f49e0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer just a few more things.
# description: test DATE_TRUNC() constant simplifications | ||
# group: [optimizer] | ||
|
||
statement ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We usually put these after the pragmas and other config statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done in dafaeec.
// | ||
// date_trunc(part, column) <> const_rhs --> column < date_trunc(part, const_rhs) OR | ||
// column >= date_trunc(part, date_add(const_rhs, INTERVAL 1 part)) | ||
// - but if date_trunc(const_rhs) != const_rhs, then this is always satisfied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have an explanation of the IS [NOT] DISTINCT
rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 41416bd.
PRAGMA enable_verification; | ||
|
||
# | ||
# check operation with hour offsets that aren't complete hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, to test this, you need to use time zones that are not hour offsets, which is not the same as UTC and instants specified by offsets. The reason is that the UTC binning rules are what happens without ICU (we even optimise to use them if we are in Gregorian/UTC as they are faster). So we need to check that the rules are being correctly applied for different time zones.
Good ones to use are:
name | abbrev | utc_offset | is_dst |
---|---|---|---|
America/Los_Angeles | PST | -07:00:00 | true |
America/St_Johns | CNT | -02:30:00 | true |
Asia/Kathmandu | Asia/Kathmandu | 05:45:00 | false |
Europe/Amsterdam | CET | 02:00:00 | true |
Pacific/Chatham | NZ-CHAT | 12:45:00 | false |
You don't need to use all of them, but one or two would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand here; I am not a timezone expert although I am capable of showing up for meetings on time. Do you mean that instead of entering times as, e.g., '2025-01-01 01:01:00+02:10'
I should instead use '2025-01-01 01:01:00 PST'
and similar? Would I also need to set a timezone as part of the test case so that the timezone of the test runner isn't used instead? I'm not sure I understand what you're going for here, sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean you should use a different set of time binning rules than UTC. So Set TimeZone='Europe/Amsterdam';
for example and then perform the operation across a daylight savings time boundary.
(And no one is a "timezone expert"!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
America/St_Johns
should cover both DST and fractional offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok---I think I understand what you were looking for here and pushed 3bd520d so that the tests run in the St. Johns timezone instead of UTC. Let me know what you think, or if I missed the goal. :)
require icu | ||
|
||
statement ok | ||
PRAGMA enable_verification; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh you also need to set the calendar:
set Calendar='gregorian';
Otherwise if the testing locale's default calendar is not Gregorian, the tests will fail. We set it for CI, but I've talked to people from China who had a different default calendar(!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I had no idea. Done in 3fcfff5.
auto comparison_type = expr.GetExpressionType(); | ||
|
||
auto &date_part = bindings[2].get().Cast<BoundConstantExpression>(); | ||
auto &column_part = bindings[3].get().Cast<BoundColumnRefExpression>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it isn't a column reference but a nested expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you are right! Fixed in 1ac88b3.
// So, get the expression type if it was ordered such that the constant was actually on the right hand side. | ||
ExpressionType rhs_comparison_type = comparison_type; | ||
if (!col_is_lhs) { | ||
switch (comparison_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FlipComparisonExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, way cleaner; done in 2a5ec0c.
auto op = make_uniq<BoundOperatorExpression>(ExpressionType::OPERATOR_IS_NULL, LogicalType::BOOLEAN); | ||
op->children.push_back(column_part.Copy()); | ||
return std::move(op); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we shouldn't fall through or share code with equality here? Share the cases, but do the NULL
check for IND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it when I implemented it and at the time I thought that it was cleaner for a reader to keep the optimizations separate, but that was a weak preference. I merged them together in d95efbb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - it's more about not repeating oneself and keeping the code a bit smaller (we've started having problems building the WASM version).
} | ||
} | ||
|
||
case ExpressionType::COMPARE_DISTINCT_FROM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Share code with <>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also done in d95efbb.
switch (date_part) { | ||
// These specifiers can be used as intervals. | ||
case DatePartSpecifier::YEAR: | ||
return "to_year"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_years
. Maybe we should add some quick coverage tests for all these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you! I fixed this and added tests for all interval types in 3ec53d7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with it!
Of course! And thanks for the review. Happy to help DuckDB get better :) |
Thanks! |
* Fix move elision and remove unnecessary includes.
Add `date_trunc()` simplification rules (duckdb/duckdb#18457)
Issue duckdb/duckdb#18457: DateTrunc Simplification Warnings (duckdb/duckdb#18687) [Python SQLLogicTest] Add `test/sql/pragma/profiling/test_profiling_all.test` to the SKIPPED_TESTS set (duckdb/duckdb#18689)
Add `date_trunc()` simplification rules (duckdb/duckdb#18457)
Issue duckdb/duckdb#18457: DateTrunc Simplification Warnings (duckdb/duckdb#18687) [Python SQLLogicTest] Add `test/sql/pragma/profiling/test_profiling_all.test` to the SKIPPED_TESTS set (duckdb/duckdb#18689)
Add `date_trunc()` simplification rules (duckdb/duckdb#18457)
Issue duckdb/duckdb#18457: DateTrunc Simplification Warnings (duckdb/duckdb#18687) [Python SQLLogicTest] Add `test/sql/pragma/profiling/test_profiling_all.test` to the SKIPPED_TESTS set (duckdb/duckdb#18689)
bump iceberg to latest main [chore] Fix amalgamation build in progress_bar (duckdb/duckdb#18910) Bump inet & aws (duckdb/duckdb#18899) fix: refine query ETA display and Kalman filter stability (duckdb/duckdb#18880) Bump httpfs to v1.4-andium branch (duckdb/duckdb#18898) Encryption now encoded as a bit, centralizing in set/getter (duckdb/duckdb#18897) Add callback for when an extension fails to load, and also log this (duckdb/duckdb#18894) Keep base data scan state alive in ColumnData::Update call (duckdb/duckdb#18893) Expected errors 2053 (duckdb/duckdb#18892) fixing auto-specifying ciphers and remove double storage (duckdb/duckdb#18891) Add rowsort to upsert_default.test (duckdb/duckdb#18890) bump aws and iceberg (duckdb/duckdb#18889) [chore] Bump config test/configs/compressed_in_memory.json to new format (duckdb/duckdb#18888) [Dev] Fix footgun in `string_t::SetSizeAndFinalize` (duckdb/duckdb#18885) Json: no reinterpret<size_t*> (duckdb/duckdb#18886) [C API] Result schema of prepared statements (duckdb/duckdb#18779) Add `COPY (FORMAT BLOB)` to Andium too :^) (duckdb/duckdb#18884) Avoid automatically checkpointing if the database instance has been invalidated (duckdb/duckdb#18881) Update spatial+vss+sqlsmith in preparation for v1.4 (duckdb/duckdb#18882) Internal duckdb/duckdb#5796: Window Progress (duckdb/duckdb#18860) [Test] Small fixes to concurrent attach/detach test (duckdb/duckdb#18862) Update ducdkb iceberg hash (duckdb/duckdb#18873) Storage fuzzing + several fixes (duckdb/duckdb#18876) Bump mbedtls to v3.6.4 (duckdb/duckdb#18871) [minor] Incompatible DB error message: add newline (duckdb/duckdb#18861) Bump & remove patches for delta, avro, excel, encodings, fts (duckdb/duckdb#18869) Add a FORCE_DEBUG flag to force `-DDEBUG`, similar to FORCE_ASSERT (duckdb/duckdb#18872) Expected errors 2053 (duckdb/duckdb#18864) update duckdb azure extension ref for 1.4.0 (duckdb/duckdb#18868) Hold segment lock during GetColumnSegmentInfo (duckdb/duckdb#18859) Centralize attached database paths in a DatabaseFilePathManager which is shared across databases created through the same DBInstanceCache (duckdb/duckdb#18857) Add more encryption modes CTR and CBC (duckdb/duckdb#18619) Bump Ducklake (duckdb/duckdb#18825) No more `wal_encryption` flag (duckdb/duckdb#18851) Fix `NULL` path for `json_each`/`json_tree` (duckdb/duckdb#18852) Make ATTACH OR REPLACE atomic, keep list of used databases in MetaTransaction (duckdb/duckdb#18850) WAL <> DB File Match Fixes (duckdb/duckdb#18849) Add test_env to unit tester (duckdb/duckdb#18847) Merge ossivalis into main (duckdb/duckdb#18844) Bump MySQL/Postgres/SQLite (duckdb/duckdb#18848) Add OnBeginExtensionLoad callback (duckdb/duckdb#18842) Ignore null verification for statistics on structs (duckdb/duckdb#18813) Document storage version flag in CLI + minor rendering fix (duckdb/duckdb#18841) Add the `VARIANT` LogicalType (duckdb/duckdb#18609) Avoid printing '99 hours', given in most cases that means estimate is… (duckdb/duckdb#18839) Don't notify Py pkg when override git describe is set (duckdb/duckdb#18843) Add support for reading/writing native parquet geometry types (duckdb/duckdb#18832) Fix/run function in transaction (duckdb/duckdb#18741) Avoid expensive checkpoints and write amplification by appending row groups, and limiting vacuum operations for the last number of row groups (duckdb/duckdb#18829) fix: silence warnings about signed/unsigned conversions. (duckdb/duckdb#18835) fix: sanitize input for enable_logging (duckdb/duckdb#18830) Ensure a WAL file matches the DB file and checkpoint iteration (duckdb/duckdb#18823) Fix: Preserve database configuration flags for tab completion in DuckDB shell (duckdb/duckdb#18482) Extensions.yml: Pass down save_cache to inner workflows (duckdb/duckdb#18828) Fix format-fix runs on Linux (duckdb/duckdb#18827) Re-add accidentally removed check if copy_from is supported (duckdb/duckdb#18824) [Fix] Bug in fixed-size buffer when throwing out-of-memory (duckdb/duckdb#18769) For BC reasons - keep VARINT as alias for BIGNUM (duckdb/duckdb#18821) Add callback to get a list of copy options, use this to provide suggestions and to erase options from import that are only used during exporting (duckdb/duckdb#18812) fix: Add COLLATE NOCASE support to strpos function (duckdb/duckdb#18819) [CI] install libcurl4-openssl-dev with apt-get (duckdb/duckdb#18811) Provide failing file name in Parquet reader error messages (duckdb/duckdb#18814) Test runner: Expand '{UUID}' into a random UUID (duckdb/duckdb#18809) Expected errors 2053 (duckdb/duckdb#18810) Add support for non-aggregate window functions (duckdb/duckdb#18788) Fix some unindented interactions between `EMPTY_RESULT_PULLUP` and `MATERIALIZED` CTEs (duckdb/duckdb#18805) Typed macro parameters (duckdb/duckdb#18786) Internal duckdb/duckdb#3273: Hashed Sort Callbacks (duckdb/duckdb#18796) [chore] Fixup tidy-check on src/logging/log_manager.cpp by passing const & (duckdb/duckdb#18801) Fixup progress_bar: avoid converting doubles into int32_t unchecked (duckdb/duckdb#18800) Issue duckdb/duckdb#18767: Ignore Timestamp Offsets (duckdb/duckdb#18794) bump httpfs so it includes curl option (duckdb/duckdb#18691) Improve autocomplete suggestions (duckdb/duckdb#18773) Remove everything python-package related (duckdb/duckdb#18789) Support expressions as COPY file target (duckdb/duckdb#18795) fix: coalesce query progress updates to reduce terminal writes (duckdb/duckdb#18672) Task Scheduler: track exact task count, and re-signal on dequeue failure if there are tasks left (duckdb/duckdb#18792) Internal duckdb/duckdb#3273: Parallel Window Masks (duckdb/duckdb#18731) fix: improve speed of GetValue() for STRUCT type (duckdb/duckdb#18785) Add `memory_limit` parameter to `benchmark_runner`/`test_runner.py` (duckdb/duckdb#18790) Treat ENABLE_EXTENSION_AUTOINSTALL as the BOOL that it is (duckdb/duckdb#18778) Move row id logic to separate RowIdColumnData class instead of inlining it into the RowGroup (duckdb/duckdb#18780) Improve error messages for merge / vector reference (duckdb/duckdb#18777) Use microsecond resolution for printing the current timestamp (duckdb/duckdb#18776) Add `file_size_bytes` (de-)serialization (duckdb/duckdb#18775) Propagate `DUCKDB_*_VERSION` in extensions and tests (duckdb/duckdb#18774) [Test Fix] Forward output to file (duckdb/duckdb#18772) [CI] Adjust test configs post logger PR (duckdb/duckdb#18771) Revert "Use 1-based indexing for SQL-based JSON array extraction" (duckdb/duckdb#18758) Make `duckdb_log` return a TIMESTAMP_TZ (duckdb/duckdb#18768) Fix Path Typo in Extension's CMake Warning Message (duckdb/duckdb#18766) Fix index resolution when querying table with index via view (duckdb/duckdb#18319) Fix radix partitioning with more than 10 bits (duckdb/duckdb#18761) Add support for auto-globbing within a directory: if no matches are found for a specific path, we retry with `/**/*.[ext]` appended (duckdb/duckdb#18760) Refactor read_blob and read_text to use MultiFileFunction. (duckdb/duckdb#18706) Add missing expected errors to the test cases (next chunk) (duckdb/duckdb#18753) Minor logging fixes and more benchmarking (duckdb/duckdb#18755) Extensions.yml should also check converted_to_draft (duckdb/duckdb#18754) [Profiling] Add Profiling to Write Function (duckdb/duckdb#18724) Fixing lazy polars execution on query result (duckdb/duckdb#18749) Remove separate WAL encryption flag (duckdb/duckdb#18750) Add leak suppressions to nightly runs (duckdb/duckdb#18748) Append using a SQL query, instead of directly appending to a base table, and support user-provided queries through the QueryAppender (duckdb/duckdb#18738) removed placeholder client directories for node and jdbc, its been > 1 yr (duckdb/duckdb#18757) Add missing expected errors to the test cases (duckdb/duckdb#18746) Add OS X notarization for DuckDB CLI and libduckdb.dylib (duckdb/duckdb#18747) Use correct type for pushing collations in subqueries (duckdb/duckdb#18744) Merge ossivalis into main (duckdb/duckdb#18719) Secrets: if serialization_type is not specified, assume it's a key value secret (duckdb/duckdb#18743) [C API] Function to set a copy callback for bind data (duckdb/duckdb#18739) fix timetravel for default tables (duckdb/duckdb#18240) [unittest] SkipLoggingSameError() to make unittester report one failure per case (duckdb/duckdb#18270) Use 1-based indexing for SQL-based JSON array extraction (duckdb/duckdb#18735) Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641) Internal duckdb/duckdb#5669: Loop Join Thresholds (duckdb/duckdb#18733) Fix PIVOT in multiple statements (duckdb/duckdb#18729) Minor fixes for other catalogs - mostly checking `IsDuckTable()` for unsupported operations (duckdb/duckdb#18720) Added support for blob<->uuid conversions (duckdb/duckdb#18027) #Fix 18558: add row_group scan fast path (duckdb/duckdb#18686) Improved grammar generation script (duckdb/duckdb#18716) Correctly throw an error when too few columns are supplied in MERGE INTO INSERT (duckdb/duckdb#18715) [Profiling] Add Profiling to Read Function (duckdb/duckdb#18661) Fix issue with materialized CTE optimization in flatten_dependent_join (duckdb/duckdb#18714) Add Option to Allocate Using an Arena in `string_t` (duckdb/duckdb#17992) Internal duckdb/duckdb#3273: Hashed Sort States (duckdb/duckdb#18690) Python-style positional/named arguments for macro's (duckdb/duckdb#18684) [Fix] Correctly handle table and index chunks in WAL replay buffering (duckdb/duckdb#18700) Make ART construction iterative via ARTBuilder (duckdb/duckdb#18702) Correctly handle collations for IN (subquery) (duckdb/duckdb#18698) Hold row group lock for entire call of MoveToCollection (duckdb/duckdb#18694) Expected errors 2053 (duckdb/duckdb#18695) Issue duckdb/duckdb#18457: DateTrunc Simplification Warnings (duckdb/duckdb#18687) [Python SQLLogicTest] Add `test/sql/pragma/profiling/test_profiling_all.test` to the SKIPPED_TESTS set (duckdb/duckdb#18689) Make sure parse errors are wrapped in ErrorData (duckdb/duckdb#18682) Internal duckdb/duckdb#5366: Window State Arguments (duckdb/duckdb#18676) Expected errors 2053 (duckdb/duckdb#14213) Add `date_trunc()` simplification rules (duckdb/duckdb#18457) Fix the issue where delta_for isn't used in bitpacking when for is unavailable (duckdb/duckdb#18616) fix error message related to wrong memory unit (duckdb/duckdb#18671) Grab lock and double-check that column is not loaded in MoveToCollection (duckdb/duckdb#18677) Correctly allocate uncompressed string data in ZSTD for many giant strings (duckdb/duckdb#18678) Internal duckdb/duckdb#5662: IEJoin Test Plans (duckdb/duckdb#18680) [ Python SQLLogic Tester ] Add `MERGE_INTO` to `statement.type` enum in `result.py` (duckdb/duckdb#18675) Internal duckdb/duckdb#5366: Window Interrupt Arguments (duckdb/duckdb#18651) Correctly set weights in reservoir sample when switch to slow sampling (duckdb/duckdb#18563) [Dev] Add script to create patch from changes in an extension repository (duckdb/duckdb#18620) Python test runner: Fix hash comparison error output (duckdb/duckdb#18626) [CI] skip building encodings extension in InvokeCI (duckdb/duckdb#18655) CLI: Make ETA more of an estimate, and support large_row_rendering for footers (duckdb/duckdb#18656) Merge ossivalis into main (duckdb/duckdb#18644) Python test runner: Fix result check for `COPY ... RETURN_STATS` queries (duckdb/duckdb#18625) Add 1.4 release codename (duckdb/duckdb#18652) Change arrow() to export record batch reader (duckdb/duckdb#18642) bump spatial (on main) (duckdb/duckdb#18197) bump avro to v1.4 (duckdb/duckdb#18434) Make more configs into generic settings (duckdb/duckdb#18592) Add "Hash Zero" verification CI run (duckdb/duckdb#18623) feat: add ETA to progress bar in DuckDB CLI (duckdb/duckdb#18575) wrap httplib ::max() call in `WIN_32` check (duckdb/duckdb#18590) [ART] ART::Erase refactoring (duckdb/duckdb#18595) [CSV Sniffer] Fix type detection issue with union and empty columns (duckdb/duckdb#18606) Add Field IDS to multi file reader for positional deletes (duckdb/duckdb#18617) Re-add `hugeint` to `__internal_compress_string` (duckdb/duckdb#18622) Adjust filter pushdown to latest polars release (duckdb/duckdb#18624) parquet/parquet_multi_file_info.cpp: fix move from stack (duckdb/duckdb#18634) Issue duckdb/duckdb#18631: Streaming Windowed Quantile (duckdb/duckdb#18636) Fix serialization backwards compatability for varargs functions (duckdb/duckdb#18596) [Profiling] Add client context into more read functions (duckdb/duckdb#18514) [CI] Don't zip and upload Code Coverage tests results when Code Coverage got cancelled (duckdb/duckdb#18607) [Test] Fix test case and a benchmark (duckdb/duckdb#18610) Update README.md (duckdb/duckdb#18614) correctly setting log transaction id in ThreadContext (duckdb/duckdb#18536) [Fix] Hidden test failure in test_struct_update.test (duckdb/duckdb#18598) Increment storage version to enable `DICT_FSST` in benchmark file (duckdb/duckdb#18588) fix hidden merge conflict (duckdb/duckdb#18589) Adds a function for updating and adding values in a struct (duckdb/duckdb#15533) Pushdown filters on coalesced outer join keys compared for equality under the join condition (duckdb/duckdb#18169) fix: libduckdb.so missing soversion (duckdb/duckdb#18305) String dictionary hash cache (duckdb/duckdb#18580) Force `LIST`/`ARRAY` child vectors on a Parquet single page (duckdb/duckdb#18578) fix: use thousands separator and decimal for row counts in`duckbox` output format (duckdb/duckdb#18564) Flip left/right delim join based on cardinalities (duckdb/duckdb#18552) [Fix] Adjust shrink threshold back to original count > SHRINK_THRESHOLD (duckdb/duckdb#18582) [CSV Sniffer] Fixing bug of not properly setting skipped rows from sniffer (duckdb/duckdb#18555) fix: add formatting to explain row counts (duckdb/duckdb#18566) Delete FUNDING.json Update FUNDING.json Create FUNDING.json [Indexes] Buffer-managed indexes part 3: segment handle for Node48 and Node256 (duckdb/duckdb#18567) Rename the Varint type to Bignum (duckdb/duckdb#18547) Add compile option standalone-debug for clang (duckdb/duckdb#17433) Fixing compilation with -std=cpp23 (duckdb/duckdb#18557) [easy] [no-op] Minor optimization on iterator lookup (duckdb/duckdb#15349) optimize/parquet: generate movable types for parquet (duckdb/duckdb#18510) Check if `heap_block_ids` is empty before getting start/end when destroying chunks in `TupleDataCollection` (duckdb/duckdb#18556) Implement special-case `VARCHAR` to `JSON[]` casts and vice versa (duckdb/duckdb#18541) [ART] Node::Free refactoring (duckdb/duckdb#18544) [Fix] Follow-up PR to only delete unique row IDs (duckdb/duckdb#18545) Restore missing `test/configs/small_block_size.json` file (duckdb/duckdb#18507) Unittester: Add the `--sort-style` parameter that allows a fallback comparison where results are sorted according to a given sort-style (duckdb/duckdb#18542) Allow overriding openssl version for FIPS compliance (duckdb/duckdb#18499) fix: improve handling variant nulls and nested types (duckdb/duckdb#18538) Add support for explicit clean-up routine in test config, and exit multi-statement execution when an error is encountered (duckdb/duckdb#18539) Use global index, not local id when creating filters in `MultiFileColumnMapper` (duckdb/duckdb#18537) Add `StatementVerifier` for `EXPLAIN` (duckdb/duckdb#18529) Add CAPI to retrieve client context for table functions (duckdb/duckdb#18520) fix: support both field orders for variant struct (duckdb/duckdb#18532) [Varint] Negation, Subtraction and Over/under-flow checking (duckdb/duckdb#18477) ALP test: skip TPC-DS 67 - it is not consistent with floating point numbers (duckdb/duckdb#18528) Consistently detect JSON schema indepent of number of threads (duckdb/duckdb#18522) Internal duckdb/duckdb#16560: Numeric TRUNC Precision (duckdb/duckdb#18511) Dynamically determine dictionary size limit in Parquet writer (if unset) (duckdb/duckdb#18356) Fix incorrect character encoding in GetLastErrorAsString on Windows (duckdb/duckdb#18431) Fix: Write the salt together with the HT offset when determining the value for key comparison (duckdb/duckdb#18374) When tracking evicted_data_per_tag, track actual size on disk after temp file compression (duckdb/duckdb#18521) Adding WITH ORDINALITY to DuckDB (duckdb/duckdb#16581) ParserException for Pragma with named parameters (duckdb/duckdb#18506) Temporarily excluding `Build Pyodide wheel` for Python 3.11 because it fails to build `WASM` wheels (duckdb/duckdb#18508) Remove `immediate_transaction_mode` from DB config options (duckdb/duckdb#18516) Allow expressions to be used in ATTACH / COPY options (duckdb/duckdb#18515) Fix several bugs/fuzzer issues (duckdb/duckdb#18503) Fix: Remove overly strict assertion on empty string value (duckdb/duckdb#18504) Change UNICODE to UTF8 (duckdb/duckdb#17586) Merge ossivalis (duckdb/duckdb#18502) fix: add missing space in AttachInfo::ToString() (duckdb/duckdb#18500) julia: config improvements (duckdb/duckdb#17585) [Profiling] Add client context into read functions (duckdb/duckdb#18438) Fix accidental internal exception in type transformation (duckdb/duckdb#18492) add delta linux back to ci (duckdb/duckdb#18491) Change ctrl-a/ctrl-e to move to start/end of line, not buffer (duckdb/duckdb#18490) Unify `ON CONFLICT` and `MERGE INTO` (duckdb/duckdb#18480) Internal duckdb/duckdb#5384: Window Sorting Polish (duckdb/duckdb#18484) re-nable extensions in invokeci (duckdb/duckdb#18476) SUM and + Operator for Varints (duckdb/duckdb#18424) Internal duckdb/duckdb#5366: WindowDeltaScanner (duckdb/duckdb#18468) Merge ossivalis (duckdb/duckdb#18456) Bump postgres to latest main (duckdb/duckdb#18464) Internal duckdb/duckdb#5385: WindowMergeSortTree Sort Update (duckdb/duckdb#18461) Add support for generic settings, and move many settings over to generic settings (duckdb/duckdb#18447) Buffer index appends during WAL replay (duckdb/duckdb#18313) Internal duckdb/duckdb#5384: WindowDistinctAggregator Sort Update (duckdb/duckdb#18442) Add support for "template" types (duckdb/duckdb#18410) Update pyodide build to 0.28.0 (duckdb/duckdb#18446) Parquet: add row-group ordinal during writing encryption (duckdb/duckdb#18433) Include pyodide build configuration (duckdb/duckdb#18183) [Fix] Block size nightly (duckdb/duckdb#18425) Internal duckdb/duckdb#5368: WindowNaiveAggregator Sort Update (duckdb/duckdb#18409) Internal duckdb/duckdb#5367: SortedAggregateFunction Sort Update (duckdb/duckdb#18408) Refactor extension CI to use extension-ci-tools (duckdb/duckdb#18361) Correctly fetch only base column data in ColumnData::FetchUpdateData (duckdb/duckdb#18423) feat: remove anything following `?` in database name (duckdb/duckdb#18417) Merge `v1.3-ossivalis` in `main` (duckdb/duckdb#18401) Add support for table_constraints of AdbcConnectionGetObjects() (duckdb/duckdb#18181) Add DuckLake back in (duckdb/duckdb#18405) Internal duckdb/duckdb#5294: TIME_NS C API (duckdb/duckdb#18215) Remove incorrect assertion (duckdb/duckdb#18404) [ Python SQLLogic Tester ] Add `MERGE_INTO` statement to duckdb python (duckdb/duckdb#18402) CI: Add separate job for discussion mirroring (duckdb/duckdb#18407) Wrap runner.ExecuteFile, otherwise cleanup is not properly performed (duckdb/duckdb#18400) Internal duckdb/duckdb#3273: Window Hashed Sort (duckdb/duckdb#18337) Store extra metadata blocks in RowGroupPointer, and only flush dirty Metadata blocks (duckdb/duckdb#18398) CI: Fix Discussion mirroring (duckdb/duckdb#18397) Record whether or not cross products are implicit or not, and use this for converting queries back to SQL (duckdb/duckdb#18394) Correct and consistent integer arithmetic error messages (duckdb/duckdb#18393) Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395) Approx database count system function (duckdb/duckdb#18392) Re-use table metadata when table is not altered during checkpoint (duckdb/duckdb#18390) Bump httpfs (duckdb/duckdb#18388) Uncomment skipped decimal REE tests (duckdb/duckdb#18372) Re-enable but deprecate CORE_EXTENSIONS in CMakeLists.txt (duckdb/duckdb#18377) Add missing ninja to workflow file (duckdb/duckdb#18373) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18364) Pass `AttachOptions` to `attach` method, and turn `StorageExtensionInfo` into an `optional_ptr` (duckdb/duckdb#18368) Split up out-of-tree extensions into separate files, and allow out-of-tree extensions to be built using BUILD_EXTENSIONS={ext_name} (duckdb/duckdb#18357) Python external dispatch param fixes (duckdb/duckdb#18359) Revert "[unittest] - fix doubled error headers on `Unexpected failure`" (duckdb/duckdb#18355) Add support for checkpointing in-memory tables (duckdb/duckdb#18348) [C API] Expose expressions and use them in scalar function binding (duckdb/duckdb#18142) Extend PEG parser grammar (duckdb/duckdb#18221) [unittest] - fix doubled error headers on `Unexpected failure` (duckdb/duckdb#18314) Fix condition indexes in join filter pushdown (duckdb/duckdb#18341) download Real Nest data in quiet mode (duckdb/duckdb#18346) Fix debug error in join order optimizer (duckdb/duckdb#18344) Aarch64 backport (duckdb/duckdb#18345) add the from-table-function as parameter to copy-from-bind (duckdb/duckdb#18004) feat: making Parquet write RowGroup.total_compressed_size (duckdb/duckdb#18307) Make storage-version a test parameter (duckdb/duckdb#18324) New Arrow C-API (duckdb/duckdb#18246) feat: Parquet extension add row_group_compressed_size (duckdb/duckdb#18294) Merge ossivalis into main (duckdb/duckdb#18272) SHOW TABLES FROM <qualified_name> (duckdb/duckdb#18179) Add target for installing Python deps. (duckdb/duckdb#18285) Use `FromEpochSeconds` instead of `FromTimeT` in `FileSystem::GetLastModifiedTime` (duckdb/duckdb#18281) [Fix] Adjust test to run with different block sizes (duckdb/duckdb#18277) Use DuckDB cast infrastructure in fmt for new uhugeint/hugeint code (duckdb/duckdb#18275) Use set for row ID scanning during index scans (duckdb/duckdb#18274) Add support for RETURNING to MERGE INTO (duckdb/duckdb#18271) Support HUGEINT in printf and format (duckdb/duckdb#13277) Expanded autocomplete suggestions (duckdb/duckdb#18243) [Parquet] Add read support for the `VARIANT` LogicalType (with shredded encoding) (duckdb/duckdb#18224) Reduce copy in Vector::Reinterpret (duckdb/duckdb#18264) Fixes for gcc 15 (duckdb/duckdb#18261) Fix dictionary-related assertions (duckdb/duckdb#18260) Allow for static libs from extension dependencies to be bundled (duckdb/duckdb#18226) disable WebAssembly duckdb-wasm builds job in NightlyTests triggered by 'workflow_dispatch' event (duckdb/duckdb#18129) Bunch of loosely connected test/CI fixes (duckdb/duckdb#18254) update run_extension_medata_tests.sh (duckdb/duckdb#17976) fixes for some minor llvm 20 complaints (duckdb/duckdb#18257) Fix integer overflow in sequence vector (duckdb/duckdb#18245) Add type safety to `FlatVector::GetData<T>`, `ConstantVector::GetData<T>` and `UnifiedVectorFormat::GetData<T>` (duckdb/duckdb#18256) Slightly higher memory limit for test (duckdb/duckdb#18235) Improve descriptions of thresholds vars affecting join algorithm selection (duckdb/duckdb#17377) Add support for geoarrow encoded geometries in geoparquet files. (duckdb/duckdb#17942) Dictionary functions (duckdb/duckdb#18127) Better `NULL` handling in `TupleDataLayout` (duckdb/duckdb#18069) Track `DataChunk` memory usage in various places (duckdb/duckdb#18191) [Parquet] Add read support for the `VARIANT` LogicalType (duckdb/duckdb#18187) Bugduckdb/duckdb#18163 Fix STDDEV_SAMP undeterminism (duckdb/duckdb#18210) Internal duckdb/duckdb#5264: NLJ Not Distinct (duckdb/duckdb#18216) Improve Parquet reader `NULL` statistics and compress all-`NULL` columns using `CompressedMaterialization` (duckdb/duckdb#18217) Get type of encoded `SortKey` from `TupleDataLayout` (duckdb/duckdb#18218) ci(pyodide): enable WASM exceptions on the latest pyodide build (duckdb/duckdb#18173) Temporary file encryption (duckdb/duckdb#18208) More internal-linkage (duckdb/duckdb#18177) Two-rowID-leaf support in the conflict manager and general refactoring (duckdb/duckdb#18194) [Parquet][Dev] Update the vendored `parquet.thrift` to `3ce0760` (duckdb/duckdb#18195) Parquet reader logging (duckdb/duckdb#18172) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18188) [Profiling] Move the client context into more write functions (duckdb/duckdb#17875) Check if `GetLastSegment` is not `nullptr` in `ColumnData::RevertAppend` (duckdb/duckdb#18171) Reduce lock contention for the instance cache (duckdb/duckdb#18079) fix bug with allowed_paths (duckdb/duckdb#18176) Avoid `realloc` in CSV writer (duckdb/duckdb#18174) fix typo (duckdb/duckdb#18165) Resolve some small build issues (duckdb/duckdb#18162) Implement `replace_type` function (duckdb/duckdb#18077) Issue duckdb/duckdb#17683: TIME_NS Compilation (duckdb/duckdb#18053) Add support for AdbcConnectionGetObjects(table_type) (duckdb/duckdb#18066) Detect when updates have no effect, and skip performing the actual updates if we encounter these nop updates (duckdb/duckdb#18144) Add support for `MERGE INTO` (duckdb/duckdb#18135) Improve sort key comparison performance (duckdb/duckdb#18131) set ::error:: annotations for test runners (duckdb/duckdb#18072) Internal duckdb/duckdb#3273: Window Task Generation (duckdb/duckdb#18113) Update description of 'arrow_lossless_conversion' (duckdb/duckdb#18046) [chore] Merge v1.3-ossivalis on main (duckdb/duckdb#18109) ci: build duckdb against the latest emscripten (duckdb/duckdb#18110) Don't throw `InternalException` in `Sort::Sink` (duckdb/duckdb#18105) TPC-DS: Use BIGINT fields (duckdb/duckdb#18098) [CI] don't run jobs on draft PRs (duckdb/duckdb#18016) Fix correlated subquery unnest fail (duckdb/duckdb#18092) [CSV Reader] Prohibit options delim and sep in same read_csv call (duckdb/duckdb#18096) Add start/end offset percentage options to Python test runner (duckdb/duckdb#18091) Avoid running DraftPR.yml until timeout if token is missing (duckdb/duckdb#18090) Unittest: Configure skip error messages (duckdb/duckdb#18087) Switch to Optional for type hints in polars lazy dataframe function (duckdb/duckdb#18078) Issue duckdb/duckdb#18071: Temporal inf -inf (duckdb/duckdb#18083) Fix some scaling issues (duckdb/duckdb#17985) Unittester: add `on_new_connection` + `on_load` + `skip_tests` options (duckdb/duckdb#18042) Use `timestamp_t` instead of `time_t` for file last modified time (duckdb/duckdb#18037) Add support for class-based expression iteration (duckdb/duckdb#18070) fix star expr exclude error (duckdb/duckdb#18063) Adding WAL encryption (duckdb/duckdb#17955) Avoid adding commands read from a file to the shell history (duckdb/duckdb#18057) Remove match-case statements from polars_io.py (duckdb/duckdb#18052) Merge ossivalis into main (duckdb/duckdb#18036) Add ppc64le spin-wait instruction (duckdb/duckdb#17837) Unittest: Add skip_compiled option that can be used to skip built-in C++ tests (duckdb/duckdb#18034) [Explain] Add the YAML format for EXPLAIN statements (duckdb/duckdb#17572) Remove Linux (32 Bit) job (duckdb/duckdb#18012) [Chore] Minor conflict manager refactoring (duckdb/duckdb#18015) Fix duckdb/duckdb#18007: correctly execute expressions with pivot operator (duckdb/duckdb#18020) c-api to copy vector with selection (duckdb/duckdb#17870) Add support to produce Polars Lazy Dataframes (duckdb/duckdb#17947) Implement consumption and production of Arrow Binary View (duckdb/duckdb#17975) Rework extension loading to go through thread-safe ExtensionManager (duckdb/duckdb#17994) Issue duckdb/duckdb#5123: make_timestamp_ms (duckdb/duckdb#17908) Flag to disable database invalidation (duckdb/duckdb#17938) [Fix] Reset profiling info before preparing a query (duckdb/duckdb#17940) Issue duckdb/duckdb#5144: AsOf Join Threshold (duckdb/duckdb#17979) [CI] Skip some workflows when updating out of tree extensions SHA (duckdb/duckdb#17949) Merge v1.3-ossivalis into main (duckdb/duckdb#17973) [nested] Allow fixed-size arrays to be unnested (duckdb/duckdb#17968) Unit Tester Configuration (duckdb/duckdb#17972) [Nested] Optimize structs in `LIST_VALUE` (duckdb/duckdb#17169) Enable building spatial and encodings extensions (duckdb/duckdb#17960) [Nested] Add `struct_position` and `struct_contains` functions (duckdb/duckdb#17819) Visual Studio 17 (2022) fixes (duckdb/duckdb#17948) [CI Nightly Fix] Skip logging test if not standard block size (duckdb/duckdb#17957) Add v1.3-ossivalis to Cross version workflow (duckdb/duckdb#17906) Unittester failures summary (duckdb/duckdb#16833) Block based encryption (duckdb/duckdb#17275) Do not dispatch JDBC/ODBC jobs in release CI runs (duckdb/duckdb#17937) fix use after free in adbc on invalid stmt (duckdb/duckdb#17927) Fix empty BP block when writing parquet (duckdb/duckdb#17929) Leverage `VectorType` in `ColumnDataCollection` (duckdb/duckdb#17881) Merge v1.3 into main (duckdb/duckdb#17897) Make CTE Materialization the Default Instead of Inlining (duckdb/duckdb#17459) Use an arena linked list for the physical operator children (duckdb/duckdb#17748) Reword GenAI policy (duckdb/duckdb#17895) Issue duckdb/duckdb#17861: FILL Argument Types (duckdb/duckdb#17888) Update function descriptions and examples for list, array, lambda functions (duckdb/duckdb#17886) Add GenAI policy (duckdb/duckdb#17882) Issue duckdb/duckdb#17849: Test FILL Duplicates (duckdb/duckdb#17869) Add STRUCT to MAP cast function (duckdb/duckdb#17799) Issue duckdb/duckdb#17040: FILL Secondary Sorts (duckdb/duckdb#17821) Issue duckdb/duckdb#17153: Window Order Columns (duckdb/duckdb#17835) julia: add missing methods from C-API (duckdb/duckdb#17733) Function Serialization: adapt to removal of overloads by explicitly casting if argument types have changed (duckdb/duckdb#17864) [Indexes] Buffer-managed indexes part 2: segment handle for base nodes (duckdb/duckdb#17828) duckdb/duckdb#17853 Enable flexible page sizes and update Android NDK to r27 in workflow. (duckdb/duckdb#17854) Internal duckdb/duckdb#4991: Remove Epoch_MS(MS) (duckdb/duckdb#17816) Add `duckdb_type` column to parquet_schema (duckdb/duckdb#17852) Merge v1.3 into main (duckdb/duckdb#17851) Fix ICE with Windows ARM64 (duckdb/duckdb#17844) fix: escape using_columns on JoinRef::ToString (duckdb/duckdb#17839) Merge130 (duckdb/duckdb#17833) Replace string for const data ptr in encryption api (duckdb/duckdb#17825) Pushdown pivot filter (duckdb/duckdb#17801) Merge v1.3 into main (duckdb/duckdb#17806) Add qualified parameter to Python GetTableNames API (duckdb/duckdb#17797) Fix propagatesNullValues for case expr (duckdb/duckdb#17796) [Profiling] Propagate the ClientContext into file handle writes (duckdb/duckdb#17754) Ensure we use the same layout in `RadixPartitionedHashTable` and `GroupedAggregateHashTable` (duckdb/duckdb#17790) [Julia] api docs improvements (duckdb/duckdb#15645) [Indexes] Buffer-managed indexes part 1: segment handles (duckdb/duckdb#17758) Mark Upper/LowerComparisonType as const (duckdb/duckdb#17773) Support glibc 2.28 environments (duckdb/duckdb#17776) Pass `ExtensionLoader` when loading extensions, change extension entry function (duckdb/duckdb#17772) Expose file_size_bytes and footer_size in parquet_file_metadata (duckdb/duckdb#17750) [CAPI] Expose ErrorData (duckdb/duckdb#17722) Rename decorator from test_nulls to null_test_parameters (duckdb/duckdb#17760) re-add httpfs apply_patches (duckdb/duckdb#17755) Deprecate windows-2019 runners (duckdb/duckdb#17745) csv_scanner: correct code comment (duckdb/duckdb#17735) Adding additional authenticated data for encryption (duckdb/duckdb#17508) [SQLLogicTester] Introduce `reset label <query label>` in the tester (duckdb/duckdb#17729) Fix windows-2025 build errors (duckdb/duckdb#17726) Aggregation performance (duckdb/duckdb#17718) fix linux extension ci (duckdb/duckdb#17720) Correctly setting the delim offset (duckdb/duckdb#17716) Sorting followup (duckdb/duckdb#17717) Revert "set default for MAIN_BRANCH_VERSIONING to false" (duckdb/duckdb#17708) ClientBufferManager wrapper to access the client context in the buffer manager (duckdb/duckdb#17699) Issue duckdb/duckdb#17040: FILL Window Function (duckdb/duckdb#17686) Merge v1.3-ossivalis into main (duckdb/duckdb#17690) New Sorting Implementation (duckdb/duckdb#17584) Output hashes in unittest and fix order (duckdb/duckdb#17664) Enable profiling output for all operator types (duckdb/duckdb#17665) [C API] Expose duckdb_scalar_function_bind_get_extra_info (duckdb/duckdb#17666) Add rowsort in generate_series test duckdb/duckdb#43 (duckdb/duckdb#17675) bump DuckDB_jll to v1.3.0 (duckdb/duckdb#17677) C API tidying (duckdb/duckdb#17623) fix extension troubleshooting link (duckdb/duckdb#17616) Move query profiler's EndQuery after commit/rollback (duckdb/duckdb#17595) Update function descriptions and examples (duckdb/duckdb#17132) Add support for ToSqlString for union types (duckdb/duckdb#17513) Remove redundant code path in the ConflictManager (duckdb/duckdb#17562) change exception type to not be an internal exception (duckdb/duckdb#17551) Python package devexp improvements (duckdb/duckdb#17483)
Hello there,
This is my first non-trivial PR to DuckDB. I hope that I have followed the procedure correctly ... please tell me if not and I can fix things. :)
I noticed that for a query of the form
select <things> from <somewhere> where date_trunc(part, some_column) > 'some constant date')
would keep thedate_trunc()
in the filters that were pushed down. However, if you have adate_trunc()
on one side and a constant on the other side, you can actually do a rewrite here to get rid of thedate_trunc()
by adjusting the constant accordingly.As a really simple example, these two are equivalent:
Sometimes the situation is a little more complex:
In any case, whenever you have
date_trunc(part, column) <comparison_op> constant_rhs
, with a little algebraic rewrite you can producecolumn <possibly_new_op> possibly_new_constant_rhs
. This in turn allows these filters to be pushed down directly into the table. For instance, before this PR, here is a query plan for the queries above:And afterwards, much simpler:
Timing-wise, for these same queries:
Run Time (s): real 0.166 user 3.729545 sys 0.127580
Run Time (s): real 0.036 user 0.545248 sys 0.142857
So, a significant speedup for this (best-case) query. To work around any caching effects, I ran those in a completely new DuckDB session where I loaded a pre-built TPCH dataset with scale factor 10 that I had computed earlier.
I wrote test cases to ensure the correctness of the optimization and also that the optimization applies.
As I mentioned I have never contributed to DuckDB before although I have certainly been deeply involved with the development of other database query optimizers, so I know that there are going to be assumptions that the query optimizer has that I don't currently have the context to know. It is possible that I have violated these in places... just let me know if so and I will get it fixed (but it seems to work fine for me in my test cases at least!).