-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Merge v1.2-histrionicus into main #16284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We had two users crash with the following backtrace: ``` frame #0: 0x0000ffffab2571ec frame #1: 0x0000aaaaac00c5fc duckling`duckdb::InternalException::InternalException(this=<unavailable>, msg=<unavailable>) at exception.cpp:328:2 frame #2: 0x0000aaaaac1ee418 duckling`duckdb::optional_ptr<duckdb::OptimisticDataWriter, true>::CheckValid(this=<unavailable>) const at optional_ptr.hpp:34:11 frame #3: 0x0000aaaaac1eea8c duckling`duckdb::MergeCollectionTask::Execute(duckdb::PhysicalBatchInsert const&, duckdb::ClientContext&, duckdb::GlobalSinkState&, duckdb::LocalSinkState&) [inlined] duckdb::optional_ptr<duckdb::OptimisticDataWriter, true>::operator*(this=<unavailable>) at optional_ptr.hpp:43:3 frame #4: 0x0000aaaaac1eea84 duckling`duckdb::MergeCollectionTask::Execute(this=0x0000aaaaf1b06150, op=<unavailable>, context=0x0000aaaba820d8d0, gstate_p=0x0000aaab06880f00, lstate_p=<unavailable>) at physical_batch_insert.cpp:219:90 frame #5: 0x0000aaaaac1d2e10 duckling`duckdb::PhysicalBatchInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const [inlined] duckdb::PhysicalBatchInsert::ExecuteTask(this=0x0000aaaafa62ab40, context=<unavailable>, gstate_p=0x0000aaab06880f00, lstate_p=0x0000aab12d442960) const at physical_batch_insert.cpp:425:8 frame #6: 0x0000aaaaac1d2dd8 duckling`duckdb::PhysicalBatchInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const [inlined] duckdb::PhysicalBatchInsert::ExecuteTasks(this=0x0000aaaafa62ab40, context=<unavailable>, gstate_p=0x0000aaab06880f00, lstate_p=0x0000aab12d442960) const at physical_batch_insert.cpp:431:9 frame #7: 0x0000aaaaac1d2dd8 duckling`duckdb::PhysicalBatchInsert::Sink(this=0x0000aaaafa62ab40, context=0x0000aab2fffd7cb0, chunk=<unavailable>, input=<unavailable>) const at physical_batch_insert.cpp:494:4 frame #8: 0x0000aaaaac353158 duckling`duckdb::PipelineExecutor::ExecutePushInternal(duckdb::DataChunk&, duckdb::ExecutionBudget&, unsigned long) [inlined] duckdb::PipelineExecutor::Sink(this=0x0000aab2fffd7c00, chunk=0x0000aab2fffd7d30, input=0x0000fffec0aba8d8) at pipeline_executor.cpp:521:24 frame #9: 0x0000aaaaac353130 duckling`duckdb::PipelineExecutor::ExecutePushInternal(this=0x0000aab2fffd7c00, input=0x0000aab2fffd7d30, chunk_budget=0x0000fffec0aba980, initial_idx=0) at pipeline_executor.cpp:332:23 frame #10: 0x0000aaaaac34f7b4 duckling`duckdb::PipelineExecutor::Execute(this=0x0000aab2fffd7c00, max_chunks=<unavailable>) at pipeline_executor.cpp:201:13 frame #11: 0x0000aaaaac34f258 duckling`duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) [inlined] duckdb::PipelineExecutor::Execute(this=<unavailable>) at pipeline_executor.cpp:278:9 frame duckdb#12: 0x0000aaaaac34f250 duckling`duckdb::PipelineTask::ExecuteTask(this=0x0000aab16dafd630, mode=<unavailable>) at pipeline.cpp:51:33 frame duckdb#13: 0x0000aaaaac348298 duckling`duckdb::ExecutorTask::Execute(this=0x0000aab16dafd630, mode=<unavailable>) at executor_task.cpp:49:11 frame duckdb#14: 0x0000aaaaac356600 duckling`duckdb::TaskScheduler::ExecuteForever(this=0x0000aaaaf0105560, marker=0x0000aaaaf00ee578) at task_scheduler.cpp:189:32 frame duckdb#15: 0x0000ffffab0a31fc frame duckdb#16: 0x0000ffffab2ad5c8 ``` Core dump analysis showed that the assertion `D_ASSERT(lstate.writer);` in `MergeCollectionTask::Execute` (i.e. it is crashing because `lstate.writer` is NULLPTR) was not satisfied when `PhysicalBatchInsert::Sink` was processing merge tasks from (other) pipeline executors. My suspicion is that this is only likely to happen for heavily concurrent workloads (applicable to the two users which crashed). The patch submitted as part of this PR has addressed the issue for these users.
Having a non-deleted copy constructor allows me to write the following code, which crashes ```cpp DuckDB duckdb(nullptr); Connection con(duckdb); auto pending = con.PendingQuery("FROM range(1000)"); auto pending2 = *pending; pending2.Execute(); // Crashes -> Attempting to execute an unsuccessful or closed pending query result ```
…b#16253) Errors are only reported if `has_error` is true, which is only set for `get_api` but not for `get_database`. https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/main/extension/extension_load.cpp#L570-L574
* Use two cursors for range searches * Reduces benchmark time from 35s to 25s
…b#16223) fixes duckdblabs/duckdb-internal#4203 fixes duckdb#16175 For smaller data sizes and a parallel Bernoulli sample, you could get skewed results. This is because the same seed was used for all threads. So if you have 10 threads and a data size of 1000, then every thread gets ~100 rows. In the example, the sample size was 1%. It's possible the random engine doesn't produce a value <0.01 for the first 100 randomly generated values. This means none of the threads return a value for the result. The fix is to assign every thread a random seed, but if repeatable is set, then we set `ParallelSink` to false and we use the seed, guaranteeing a repeatable result. This PR is similar to how reservoir sampling currently behaves. related PR https://github.com/duckdb/duckdb/pull/14797/files
…lias instead of by index
…ximum line size and buffer size
…er (duckdb#16263) Currently we keep the compressed buffer around, when we really only need it for a short amount of time (when we read data to then decompress it). Since we keep the compressed buffer around per column, these resizeable buffers can grow to a rather substantial amount of memory, which can lead to us using 50-100% extra memory when reading Parquet files. Destroying and re-allocating the buffer does not seem to greatly affect performance - below are ClickBench timings ran on my laptop on a single big file (which is when caching matters the most given that caches are re-instantiated between files anyway): We can perhaps do something more clever where we keep around a single cache that we share across readers, but not for the bug-fix release. | Query | Old | New | Ratio | |-------|------:|------:|------:| | Q00 | 0.165 | 0.156 | 0.95 | | Q01 | 0.156 | 0.155 | 0.99 | | Q02 | 0.192 | 0.191 | 0.99 | | Q03 | 0.179 | 0.188 | 1.05 | | Q04 | 0.292 | 0.287 | 0.98 | | Q05 | 0.364 | 0.349 | 0.96 | | Q06 | 0.154 | 0.157 | 1.02 | | Q07 | 0.158 | 0.159 | 1.01 | | Q08 | 0.361 | 0.348 | 0.96 | | Q09 | 0.449 | 0.440 | 0.98 | | Q10 | 0.225 | 0.220 | 0.98 | | Q11 | 0.251 | 0.249 | 0.99 | | Q12 | 0.379 | 0.355 | 0.94 | | Q13 | 0.552 | 0.539 | 0.98 | | Q14 | 0.426 | 0.412 | 0.97 | | Q15 | 0.351 | 0.325 | 0.93 | | Q16 | 0.665 | 0.629 | 0.95 | | Q17 | 0.617 | 0.596 | 0.97 | | Q18 | 0.971 | 0.947 | 0.98 | | Q19 | 0.215 | 0.196 | 0.91 | | Q20 | 0.929 | 0.892 | 0.96 | | Q21 | 0.730 | 0.719 | 0.98 | | Q22 | 1.213 | 1.226 | 1.01 | | Q23 | 3.062 | 2.905 | 0.95 | | Q24 | 0.189 | 0.189 | 1.00 | | Q25 | 0.178 | 0.172 | 0.97 | | Q26 | 0.201 | 0.186 | 0.93 | | Q27 | 0.771 | 0.729 | 0.95 | | Q28 | 7.475 | 7.246 | 0.97 | | Q29 | 0.254 | 0.236 | 0.93 | | Q30 | 0.483 | 0.477 | 0.99 | | Q31 | 0.508 | 0.549 | 1.08 | | Q32 | 1.168 | 1.254 | 1.07 | | Q33 | 1.431 | 1.493 | 1.04 | | Q34 | 1.453 | 1.511 | 1.04 | | Q35 | 0.359 | 0.369 | 1.03 | | Q36 | 0.165 | 0.184 | 1.12 | | Q37 | 0.161 | 0.166 | 1.03 | | Q38 | 0.173 | 0.172 | 0.99 | | Q39 | 0.177 | 0.182 | 1.03 | | Q40 | 0.165 | 0.169 | 1.02 | | Q41 | 0.162 | 0.172 | 1.06 |
…nemap (duckdb#16269) During row group checks - we keep track of which filters are always true (and hence do not need to be checked during the actual scan). Due to a missing `else` statement, we could call this method twice for the same filter. Now this doesn't look problematic, but in `SetFilterAlwaysTrue` we *also* keep track of how many filters have been set as being always true - and if all filters are set as being always true, the filters are skipped entirely. This can cause problems when optional filters (as e.g. constructed due to a Top-N) are combined with non-optional filters (as e.g. constructed through a `WHERE` clause). The filter would not be correctly executed - leading to incorrect rows sneaking into the result.
…lias instead of by index (duckdb#16272) Fix duckdb#16231
* Check hints for equality and skip search. fixes: duckdb#16250 fixes: duckdblabs/duckdb-internal#4229
Fix duckdb#16257 I've attached the diff that needs to happen to `main` when we eventually need to merge this branch into it, as the Parquet writer has been refactored significantly in the meantime. [page_fix.txt](https://github.com/user-attachments/files/18827335/page_fix.txt)
Basically cherry-picking the commits of duckdb#16154 to the `v1.2` branch, and reducing the number of tests to a minimal set.
* Use two cursors for range searches * Reduces benchmark time from 35s to 25s
Antonov548
added a commit
to Antonov548/duckdb-r
that referenced
this pull request
Mar 4, 2025
Merge v1.2-histrionicus into main (duckdb/duckdb#16284)
krlmlr
pushed a commit
to duckdb/duckdb-r
that referenced
this pull request
Mar 5, 2025
Merge v1.2-histrionicus into main (duckdb/duckdb#16284)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 15, 2025
Merge v1.2-histrionicus into main (duckdb/duckdb#16284)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 15, 2025
Merge v1.2-histrionicus into main (duckdb/duckdb#16284)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 17, 2025
Merge v1.2-histrionicus into main (duckdb/duckdb#16284)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
Merge v1.2-histrionicus into main (duckdb/duckdb#16284)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.