Skip to content

Conversation

pdet
Copy link
Collaborator

@pdet pdet commented Jun 22, 2021

No description provided.

@Mytherin Mytherin merged commit 6bc1e42 into Mytherin:listalwayshasentry Jun 22, 2021
@Mytherin
Copy link
Owner

Thanks!

Mytherin pushed a commit that referenced this pull request Mar 14, 2022
Mytherin pushed a commit that referenced this pull request Mar 18, 2022
restore original code
Mytherin pushed a commit that referenced this pull request Aug 12, 2022
Trivil function zversion
Mytherin pushed a commit that referenced this pull request Apr 13, 2023
merged with latest duckdb
Mytherin pushed a commit that referenced this pull request Sep 21, 2023
Mytherin pushed a commit that referenced this pull request Jan 9, 2024
commit 13fb9e2
Author: Tmonster <tom@ebergen.com>
Date:   Mon Dec 18 11:37:06 2023 -0800

    PR cleanup #2

commit 066f3cc
Author: Tmonster <tom@ebergen.com>
Date:   Mon Dec 18 11:21:07 2023 -0800

    fix dereference nullptr

commit 094db53
Author: Tmonster <tom@ebergen.com>
Date:   Mon Dec 18 10:43:15 2023 -0800

    PR cleanup

commit c9a1ecd
Merge: 2893c0c 6258996
Author: Tmonster <tom@ebergen.com>
Date:   Mon Dec 18 10:22:20 2023 -0800

    Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors

commit 2893c0c
Author: Tmonster <tom@ebergen.com>
Date:   Thu Dec 14 13:10:25 2023 +0100

    make format fix. Get compiler ready

commit 80b5f13
Merge: e30b726 c29eb0c
Author: Tmonster <tom@ebergen.com>
Date:   Thu Dec 14 12:34:18 2023 +0100

    Merge branch 'main' into reservoir_sampler_Vectors

commit e30b726
Author: Tmonster <tom@ebergen.com>
Date:   Thu Dec 14 12:33:03 2023 +0100

    remove all parallelism. will do it in the next iteration

commit e8e088d
Author: Tmonster <tom@ebergen.com>
Date:   Thu Dec 14 11:52:27 2023 +0100

    still failing a test. Merging samples collected in parallel is difficult, and probably doesnt provide much benefit. Going to leave it for later

commit 96bfa1c
Merge: 45fa9a5 3237244
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Dec 13 17:02:31 2023 +0100

    Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors

commit 45fa9a5
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Dec 13 14:36:50 2023 +0100

    make format-fix

commit 049327b
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Dec 13 14:31:22 2023 +0100

    try to fix this parallel issue

commit a5b290d
Merge: 21d4120 8849f97
Author: Tom Ebergen <tom@ebergen.com>
Date:   Tue Dec 12 11:18:52 2023 +0100

    Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors

commit 21d4120
Merge: 795c454 e117c34
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Dec 11 12:43:43 2023 +0100

    Merge branch 'main' into reservoir_sampler_Vectors

commit c29eb0c
Merge: 6bf31e1 25906f3
Author: Tmonster <tom@ebergen.com>
Date:   Thu Dec 7 15:23:06 2023 +0100

    Merge remote-tracking branch 'upstream/main'

commit 6bf31e1
Author: Elliana May <me@mause.me>
Date:   Mon Dec 4 22:21:30 2023 +0800

    fix warning

commit a521081
Author: Elliana May <me@mause.me>
Date:   Mon Dec 4 21:58:50 2023 +0800

    add test for streaming extracted statements

commit 5ee902a
Author: Elliana May <me@mause.me>
Date:   Mon Dec 4 21:15:30 2023 +0800

    add some tests of duckdb_execute_prepared_streaming

commit 58b6664
Author: Elliana May <me@mause.me>
Date:   Mon Dec 4 21:02:48 2023 +0800

    chore(docs): update docs for duckdb_execute_prepared_streaming

commit a8e49b1
Author: Hannes Mühleisen <hannes@duckdblabs.com>
Date:   Tue Dec 5 11:31:21 2023 +0100

    add test case, apparently from snowflake

commit a7ee1dd
Author: Hannes Mühleisen <hannes@duckdblabs.com>
Date:   Tue Dec 5 11:25:51 2023 +0100

    enable implicit fallthrough warning for /src and fixed a few instances

commit c6bf4c6
Author: Hannes Mühleisen <hannes@duckdblabs.com>
Date:   Tue Dec 5 11:02:54 2023 +0100

    supporting more physical types of parquet time columns with time zone info

commit baf670f
Author: Jacob <535707+jkub@users.noreply.github.com>
Date:   Mon Dec 4 09:05:56 2023 -0800

    make BufferPool members protected

commit 878e7d2
Author: Yves <yves@motherduck.com>
Date:   Mon Dec 4 12:00:49 2023 -0500

    Mark BufferPool getters const

commit a7ddb87
Author: Gabor Szarnyas <gabor@duckdblabs.com>
Date:   Mon Dec 4 16:22:44 2023 +0100

    Capitalize URL in httpfs extension flags

commit 795c454
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Dec 6 13:23:29 2023 +0100

    removing reservoir type checks

commit 6e0e431
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Dec 6 11:25:50 2023 +0100

    make format fix

commit 236825b
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Dec 6 10:23:57 2023 +0100

    remove unused code

commit 34902e9
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 5 21:20:09 2023 +0100

    should pass make format fix

commit 42d3fb8
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 5 18:04:36 2023 +0100

    percentage is still global, but rows is local

commit d378cc7
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 5 15:37:40 2023 +0100

    some debugging statements

commit 4ad877c
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 5 14:16:25 2023 +0100

    some changes. Have a lot of bugs solved. but still not great

commit ad79d30
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Dec 4 17:41:37 2023 +0100

    have figured out why percentage wasnt working. but it requires a big rework

commit 04d4c0d
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Dec 4 14:10:26 2023 +0100

    reservoir sample works. but for large cardinalities and high percentages no

commit ddcea54
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Dec 4 12:33:16 2023 +0100

    remove std::couts

commit 4e12d15
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Nov 29 17:47:16 2023 +0100

    ok, have the proper output for reservoir sampling. need to understand when to add local sample or global sample

commit 43e72a4
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Nov 29 15:23:55 2023 +0100

    compiles. Now I want to figure out where I left off last time

commit 450655c
Merge: 3639e4c 3f96a90
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Nov 29 15:00:08 2023 +0100

    Merge branch 'main' into reservoir_sampler_Vectors

commit 3639e4c
Merge: c10b3a4 5bc0773
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Nov 29 14:56:39 2023 +0100

    Merge branch 'main' into reservoir_sampler_Vectors

commit c10b3a4
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Jan 23 10:04:20 2023 +0100

    this should work now for sampling a set amount of rows. Still need to work on percentage sampling

commit 7147e2a
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Jan 18 16:38:02 2023 +0100

    it is starting to work, but need to look into why it is still slow

commit 2255424
Author: Tom Ebergen <tom@ebergen.com>
Date:   Wed Jan 18 11:21:15 2023 +0100

    working for normal blocking sample, but not for percentage

commit 8a01b32
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Jan 16 17:02:34 2023 +0100

    intermediate commit, will fix other spots later

commit 3676737
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Jan 16 09:03:52 2023 +0100

    intermediate work, will be fixing later

commit 904d220
Author: Tom Ebergen <tom@ebergen.com>
Date:   Fri Jan 13 13:50:43 2023 +0100

    collecting samples in parallel now, now I need to figure out how to combine them in a proper uniform and weighted manner

commit 01c4b89
Author: Tom Ebergen <tom@ebergen.com>
Date:   Tue Jan 10 15:22:28 2023 +0100

    minor code cleanup

commit b5c6d61
Author: Tom Ebergen <tom@ebergen.com>
Date:   Tue Jan 10 11:51:32 2023 +0100

    get rid of 4 spaces

commit 1dc807f
Merge: 605520f 7e1a307
Author: Tom Ebergen <tom@ebergen.com>
Date:   Tue Jan 10 11:50:14 2023 +0100

    Merge branch 'reservoir_sampler_Vectors' of github.com:Tmonster/duckdb into reservoir_sampler_Vectors

commit 7e1a307
Author: Tmonster <tom@ebergen.com>
Date:   Wed Jan 4 11:39:30 2023 -0800

    make format-fix

commit 750c1e3
Author: Tmonster <tom@ebergen.com>
Date:   Wed Jan 4 11:37:27 2023 -0800

    small syntax updates

commit fa2ac9c
Author: Tmonster <tom@ebergen.com>
Date:   Wed Jan 4 11:36:33 2023 -0800

    small syntax updates

commit cd232c6
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 27 14:54:08 2022 -0800

    Revert "mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not"

    This reverts commit 0f08574.

commit 0f08574
Author: Tmonster <tom@ebergen.com>
Date:   Wed Dec 21 15:08:52 2022 -0800

    mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not

commit f4f5834
Author: Tmonster <tom@ebergen.com>
Date:   Wed Dec 21 13:43:00 2022 -0800

    remove iostream

commit bee57ae
Author: Tmonster <tom@ebergen.com>
Date:   Wed Dec 21 12:37:02 2022 -0800

    make format fix

commit ce950a1
Author: Tmonster <tom@ebergen.com>
Date:   Wed Dec 21 09:10:23 2022 -0800

    ok added test over reservoir threshold

commit 29fb39a
Author: Tmonster <tom@ebergen.com>
Date:   Wed Dec 21 09:01:21 2022 -0800

    ok it's all in a datachunk, now I can try and parallelize it

commit 8f05514
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 20 17:03:25 2022 +0100

    remove pragma threads

commit 98f9897
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 20 17:02:58 2022 +0100

    no more memory errors

commit da234b7
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 20 12:24:37 2022 +0100

    no more errors when running count(*) on samples greater than the basic vector size

commit 3fc7214
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 20 10:04:13 2022 +0100

    fix error

commit e302946
Author: Tmonster <tom@ebergen.com>
Date:   Tue Dec 20 10:03:30 2022 +0100

    still errors

commit 7ea6405
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Dec 19 21:15:38 2022 +0100

    its getting better but still getting memory errors

commit 8f3c597
Author: Tom Ebergen <tom@ebergen.com>
Date:   Fri Dec 16 16:48:12 2022 +0100

    add some functionality, but mostly making reservoir sampler use datachunk chunkcollection

commit 605520f
Author: Tom Ebergen <tom@ebergen.com>
Date:   Mon Dec 19 21:15:38 2022 +0100

    its getting better but still getting memory errors

commit 97a491e
Author: Tom Ebergen <tom@ebergen.com>
Date:   Fri Dec 16 16:48:12 2022 +0100

    add some functionality, but mostly making reservoir sampler use datachunk chunkcollection
Mytherin added a commit that referenced this pull request Oct 16, 2024
* Run formatter also on src/include/duckdb/core_functions/...

* fix numpy issues with the 'string' dtype changes

* Use numeric_limits

* Format fix

* Fix duckdb#12467 changes to stddev calculation

* Format fix

* Update min/max to cache allocations and prevent unnecessary re-allocation

* missed some constants in FixedSizeBuffer

* first step ci run for android

* baby steps

* typo

* explicit platform

* extension static build

* more env and ninja

* add arm64

* wrong flag

* extensions, fingers crossed

* container

* using default containers

* removing it in more places

* patch vss

* port changes of 'python_datetime_and_deltatime_missing_stride' from 'feature' to 'main'

* Switch arg_min/arg_max to use sort key instead of vectors

* Clean up unused functions

* AddStringOrBlob

* Skip only built-in optimizers

* Add support for arg_min(ANY, ANY)

* revert extension patch with optional_idx

* Correct count

* Format fix

* Format fix

* Switch fallback implementation of FIRST to use sort keys instead of vectors

* WIP - clean up histogram function, avoid using ListVector::Push

* Move many tests to slow

* Add support for all types to the histogram function

* dont WaitOnTask if there are no tasks available

* Rework list_distinct/list_unique to support arbitrary types and to no longer use values and ListVector::Push

* Format fix

* fix compilation

* Avoid overriding types in PrepareTypeForCast when not required

* Use string_t and the arena allocator to allocate strings in the histogram function, instead of std::string

* this is soooo much better

* forgot to add new file

* apply feedback

* prevent the undefined behaviour of std::tolower() by casting the input to uint_8

* Binned histograms WIP

* format

* fix up test

* More tests

* Format fix + use string_map_t here

* Detect duplicate bounds, sort bounds, and allow empty bounds

* Binned histograms working for all types

* Add binned histogram test to test all types

* Unify/clean up histogram and binned histogram

* RequiresExtract

* Add TPC-H tests

* Improve error message

* Format

* Add equi-width binning method for integers

* More clean-up and testing, add support for doubles

* lets start with this

* Update .github/workflows/Android.yml

Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com>

* add missing headers

* Make equi-width-bins always return the input type

* treat NOTHING different from REPLACE/UPDATE, the filtered tuples should not be added to the returning chunk

* format

* nits, use ExtensionHelper to check for spatial extension

* np.NaN -> np.nan

* remove pyarrow version lock

* cxxheaderparser

* commit generated code

* add reqs

* Update CodeQuality.yml

* name capi enums

* rewrite verify_enum_integrity.py to use cxxheaderparser

* fix dependency installation

* Add equi-width binning support for dates/timestamps

* update messages

* remove dead code

* format

* Make typeof a constant function (removing its argument) so that the optimizer can optimize branches away

* Format

* Binning test

* Revert "Make typeof a constant function (removing its argument) so that the optimizer can optimize branches away"

This reverts commit 4455c46.

* Fix test

* Remove duplicate test

* Re-generate functions

* Use /

* Add bind_expression function, and make typeof return a constant expression when possible

* Set function pointer

* Add function can_cast_implicitly that, given a source and target type, states whether or not the source can be implicitly cast to the target

* This is optimized away

* This is moved

* change np.NaN -> np.nan

* Fixes for equi_width_bins with timestamps - cap bins at bin_count, propagate fractional months/days downwards (i.e. 0.2 months becomes 6 days) and handle bin_count > time_diff in micros

* Add histogram and histogram_values table macros + infrastructure for creating default table macros

* Feature duckdb#1272: Window Executor State

PR feedback.

* More testing for histogram function

* Allow min as boundary (when there are few values this might occur)

* Format

* Add missing include

* Fix tests

* Move mode function to owning string map

* Format fix

* Use correct map type in list aggregate for histogram

* Make mode work for all types by using sort keys

* Remove N from this test as there is a duplicate

* "benchmark/micro/*" >> "benchmark/micro/*.benchmark"

* add copy_to_select hook, rework unsupported types in copy_to and export

* optimize

* remove unused variable

* Modify test to be deterministic

* pass along options to select

* nit

* allow ducktyping of lists/dicts

* add enum for copy to info

* move type visit functions into TypeVisitor helper

* When dropping a table - clear the local storage of any appended data to that table

* Add include

* Set flags again in test

* Also call OnDropEntry for CREATE OR REPLACE

* Add HTTP error code to extension install failures

* Issue duckdb#12600: Streaming Positive LAG

Use buffering to support streaming computation of constant positive LAGs
and negative LEADs that are at most one vector away.
This doesn't fix the "look ahead" problem, but the benchmark shows
it is about 5x faster than the non-streaming version.

* Issue duckdb#12600: Streaming Positive LAG

Add new benchmark.

* Feature duckdb#1272: Window Group Preparation

Move the construction of the row data collections and masks
to the Finalize phase. These are relatively fast
and will use data that is still hot (e.g., the sort keys).
This will make it easier parallelise the remaining two passes
over the data (build and execute).

* Feature duckdb#1272: Window Group Preparation

Code cleanup and cast fixes.

* Turn window_start and window_end into idx_t

* Initialize payload collection DataChunk with payload_count to prevent resizing

* Format

* VectorOperations::Copy - fast path when copying an aligned flat validity mask into a flat vector

* Set is_dropped flag instead of actually dropping data, as other scans might be depending on the local storage (in particular when running CREATE OR REPLACE tbl AS SELECT * FROM tbl)

* Add missing include

* Create sort key helper class and use it in min/max

* Simplify histogram combine

* StateCombine for binned histogram

* Rework mode to also use sort key helpers

* Remove specialized decimal implementation of minmax

* Disable android CI on pushes/PRs

* Quantile clean-up part 1: move bind data and sort tree to separate files

* Move quantile window state into separate struct

* Move MAD and quantile states/operations to separate files

* Rework median - allow median(ANY) instead of having different function overloads

* Avoid usage of std::string in quantiles, and switch to arena allocated strings as well

* Add fallback option to discrete quantile

* Issue duckdb#12600: Streaming Positive LAG

Incorporate various PR feedback improvements:

* Cached Executors
* Validity Mask reset
* Small Buffers

Along with the mask copy improvement in VectorOperations::Copy
these reduce the benchmark runtime by another 3x.

* quantile_disc scalar and lists working for arbitrary types

* Remove pre-allocation for now

* Feature 1272: Window Payload Preallocation

Only preallocate payloads for the value window functions
(LEAD, LAG, FIRST, LAST, VALUE) instad of all of them.

* Quantile binding clean-up, add test_all_types for quantile_disc and median

* Test + CI fixes

* Format

* Clean up old deserialization code

* Set destructor, remove some more dead code

---------

Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com>
Co-authored-by: Tishj <t_b@live.nl>
Co-authored-by: Mark Raasveldt <mark.raasveldt@gmail.com>
Co-authored-by: taniabogatsch <44262898+taniabogatsch@users.noreply.github.com>
Co-authored-by: Hannes Mühleisen <hannes@duckdblabs.com>
Co-authored-by: Christina Sioula <chrisiou.myk@gmail.com>
Co-authored-by: Hannes Mühleisen <hannes@muehleisen.org>
Co-authored-by: Richard Wesley <13156216+hawkfish@users.noreply.github.com>
Co-authored-by: Max Gabrielsson <max@gabrielsson.com>
Co-authored-by: Elliana May <me@mause.me>
Co-authored-by: Richard Wesley <hawkfish@electricfish.com>
Co-authored-by: Maia <maia@duckdblabs.com>
Mytherin added a commit that referenced this pull request Dec 3, 2024
I was investigating the following crash where a checkpoint task had its
underlying resources being destroyed while it was still running. The two
interesting threads are the following:

```
thread #1, name = 'duckling', stop reason = signal SIGTRAP
    frame #0: 0x0000ffff91bb71ec
    frame #1: 0x0000aaaad73a38e8 duckling`duckdb::InternalException::InternalException(this=<unavailable>, msg=<unavailable>) at exception.cpp:336:2
    frame #2: 0x0000aaaad786eb68 duckling`duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::operator*() const [inlined] duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::AssertNotNull(null=<unavailable>) at unique_ptr.hpp:25:10
    frame #3: 0x0000aaaad786eaf4 duckling`duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::operator*(this=0x0000aaacbb73e008) const at unique_ptr.hpp:34:4
    frame #4: 0x0000aaaad787abbc duckling`duckdb::CheckpointTask::ExecuteTask(this=0x0000aaabec92be60) at row_group_collection.cpp:732:21
    frame #5: 0x0000aaaad7756ea4 duckling`duckdb::BaseExecutorTask::Execute(this=0x0000aaabec92be60, mode=<unavailable>) at task_executor.cpp:72:3
    frame #6: 0x0000aaaad7757e28 duckling`duckdb::TaskScheduler::ExecuteForever(this=0x0000aaaafda30e10, marker=0x0000aaaafda164a8) at task_scheduler.cpp:189:32
    frame #7: 0x0000ffff91a031fc
    frame #8: 0x0000ffff91c0d5c8

thread duckdb#120, stop reason = signal 0
    frame #0: 0x0000ffff91c71c24
    frame #1: 0x0000ffff91e1264c
    frame #2: 0x0000ffff91e01888
    frame #3: 0x0000ffff91e018f8
    frame #4: 0x0000ffff91e01c10
    frame #5: 0x0000ffff91e05afc
    frame #6: 0x0000ffff91e05e70
    frame #7: 0x0000aaaad784b63c duckling`duckdb::RowGroup::~RowGroup() [inlined] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(this=<unavailable>) at shared_ptr_base.h:184:10
    frame #8: 0x0000aaaad784b5b4 duckling`duckdb::RowGroup::~RowGroup() [inlined] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count(this=<unavailable>) at shared_ptr_base.h:705:11
    frame #9: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] std::__shared_ptr<duckdb::ColumnData, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr(this=<unavailable>) at shared_ptr_base.h:1154:31
    frame #10: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] duckdb::shared_ptr<duckdb::ColumnData, true>::~shared_ptr(this=<unavailable>) at shared_ptr_ipp.hpp:115:24
    frame #11: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>>(__pointer=<unavailable>) at stl_construct.h:151:19
    frame duckdb#12: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy_aux<false>::__destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*>(__first=<unavailable>, __last=<unavailable>) at stl_construct.h:163:6
    frame duckdb#13: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*>(__first=<unavailable>, __last=<unavailable>) at stl_construct.h:195:7
    frame duckdb#14: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*, duckdb::shared_ptr<duckdb::ColumnData, true>>(__first=<unavailable>, __last=<unavailable>, (null)=<unavailable>) at alloc_traits.h:848:7
    frame duckdb#15: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] std::vector<duckdb::shared_ptr<duckdb::ColumnData, true>, std::allocator<duckdb::shared_ptr<duckdb::ColumnData, true>>>::~vector(this=<unavailable>) at stl_vector.h:680:2
    frame duckdb#16: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup(this=<unavailable>) at row_group.cpp:83:1
    frame duckdb#17: 0x0000aaaad786ee18 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] std::default_delete<duckdb::RowGroup>::operator()(this=0x0000aaacbb73e1a8, __ptr=0x0000aaab75ae7860) const at unique_ptr.h:85:2
    frame duckdb#18: 0x0000aaaad786ee10 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() at unique_ptr.h:361:4
    frame duckdb#19: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] duckdb::SegmentNode<duckdb::RowGroup>::~SegmentNode(this=0x0000aaacbb73e1a0) at segment_tree.hpp:21:8
    frame duckdb#20: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>>(__pointer=0x0000aaacbb73e1a0) at stl_construct.h:151:19
    frame duckdb#21: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy_aux<false>::__destroy<duckdb::SegmentNode<duckdb::RowGroup>*>(__first=0x0000aaacbb73e1a0, __last=0x0000aaacbb751130) at stl_construct.h:163:6
    frame duckdb#22: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>*>(__first=<unavailable>, __last=0x0000aaacbb751130) at stl_construct.h:195:7
    frame duckdb#23: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>*, duckdb::SegmentNode<duckdb::RowGroup>>(__first=<unavailable>, __last=0x0000aaacbb751130, (null)=0x0000fffefc81a908) at alloc_traits.h:848:7
    frame duckdb#24: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector(this=size=4883) at stl_vector.h:680:2
    frame duckdb#25: 0x0000aaaad7857f74 duckling`duckdb::RowGroupCollection::Checkpoint(this=<unavailable>, writer=<unavailable>, global_stats=0x0000fffefc81a9c0) at row_group_collection.cpp:1017:1
    frame duckdb#26: 0x0000aaaad788f02c duckling`duckdb::DataTable::Checkpoint(this=0x0000aaab04649e70, writer=0x0000aaab6584ea80, serializer=0x0000fffefc81ab38) at data_table.cpp:1427:14
    frame duckdb#27: 0x0000aaaad7881394 duckling`duckdb::SingleFileCheckpointWriter::WriteTable(this=0x0000fffefc81b128, table=0x0000aaab023b78c0, serializer=0x0000fffefc81ab38) at checkpoint_manager.cpp:528:11
    frame duckdb#28: 0x0000aaaad787ece4 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] duckdb::SingleFileCheckpointWriter::CreateCheckpoint(this=<unavailable>, obj=0x0000fffefc81ab38)::$_7::operator()(duckdb::Serializer::List&, unsigned long) const::'lambda'(duckdb::Serializer&)::operator()(duckdb::Serializer&) const at checkpoint_manager.cpp:181:43
    frame duckdb#29: 0x0000aaaad787ecd8 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] void duckdb::Serializer::List::WriteObject<duckdb::SingleFileCheckpointWriter::CreateCheckpoint()::$_7::operator()(duckdb::Serializer::List&, unsigned long) const::'lambda'(duckdb::Serializer&)>(this=<unavailable>, f=(unnamed class) @ 0x0000600002cbd2b0) at serializer.hpp:385:2
    frame duckdb#30: 0x0000aaaad787ecc4 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] duckdb::SingleFileCheckpointWriter::CreateCheckpoint()::$_7::operator()(this=<unavailable>, list=<unavailable>, i=2) const at checkpoint_manager.cpp:181:8
    frame duckdb#31: 0x0000aaaad787ecb0 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() at serializer.hpp:151:4
    frame duckdb#32: 0x0000aaaad787ec94 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint(this=0x0000fffefc81b128) at checkpoint_manager.cpp:179:13
    frame duckdb#33: 0x0000aaaad78954a8 duckling`duckdb::SingleFileStorageManager::CreateCheckpoint(this=0x0000aaaafe1de140, options=(wal_action = DONT_DELETE_WAL, action = CHECKPOINT_IF_REQUIRED, type = FULL_CHECKPOINT)) at storage_manager.cpp:365:17
    frame duckdb#34: 0x0000aaaad78baac0 duckling`duckdb::DuckTransactionManager::Checkpoint(this=0x0000aaaafe167e00, context=<unavailable>, force=<unavailable>) at duck_transaction_manager.cpp:198:18
    frame duckdb#35: 0x0000aaaad69d02c0 duckling`md::Server::BackgroundCheckpointIfNeeded(this=0x0000aaaafdbfe900) at server.cpp:1983:31
    frame duckdb#36: 0x0000aaaadac5d3f0 duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::function<void ()>::operator()(this=<unavailable>) const at std_function.h:590:9
    frame duckdb#37: 0x0000aaaadac5d3e0 duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] md::BackgroundCronTask::Start(unsigned long)::$_0::operator()(this=0x0000aaaafdf169a8) const at background_cron_task.cpp:25:4
    frame duckdb#38: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] void std::__invoke_impl<void, md::BackgroundCronTask::Start(unsigned long)::$_0>((null)=<unavailable>, __f=0x0000aaaafdf169a8) at invoke.h:61:14
    frame duckdb#39: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::__invoke_result<md::BackgroundCronTask::Start(unsigned long)::$_0>::type std::__invoke<md::BackgroundCronTask::Start(unsigned long)::$_0>(__fn=0x0000aaaafdf169a8) at invoke.h:96:14
    frame duckdb#40: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] void std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>::_M_invoke<0ul>(this=0x0000aaaafdf169a8, (null)=<unavailable>) at std_thread.h:259:13
    frame duckdb#41: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>::operator()(this=0x0000aaaafdf169a8) at std_thread.h:266:11
    frame duckdb#42: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run(this=0x0000aaaafdf169a0) at std_thread.h:211:13
    frame duckdb#43: 0x0000ffff91a031fc
    frame duckdb#44: 0x0000ffff91c0d5c8
```

The problem is that if there's an IO exception being thrown in
`RowGroupCollection::Checkpoint` after some (but not all) checkpoint
tasks have been scheduled but before
`checkpoint_state.executor.WorkOnTasks();` is called, it results in an
InternalException / DuckDB crash as the `Checkpoint ` method does not
wait for the scheduled tasks to have completed before destroying the
referenced resources.
Mytherin added a commit that referenced this pull request Jan 20, 2025
Fixes duckdblabs/duckdb-internal#3922

The failing query
```SQL
SET order_by_non_integer_literal=true;
SELECT DISTINCT ON ( 'string' ) 'string', GROUP BY CUBE ( 'string', ), 'string' IN ( SELECT 'string' ), HAVING 'string' IN ( SELECT 'string');
```

The Plan generated before optimization is below. During optimization
there is an attempt to convert the mark join into a semi. Before this
conversion takes place, we usually check to make sure the mark join is
not used in any operators above the mark join to prevent plan
verification errors. Up until this point, only logical projections were
checked for mark joins. Turns out this query is planned in such a way
that the mark join is in one of the expressions of the aggregate
operator. This was not checked, so the mark to semi conversion would
take place. The fix is to modify the filter pushdown optimization so
that it stores table indexes from logical aggregate operators.

```
┌───────────────────────────┐
│       PROJECTION #1       │
│    ────────────────────   │
│    Expressions: #[2.0]    │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│    Expressions: #[2.1]    │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    AGGREGATE #2, #3, #4   │
│    ────────────────────   │
│          Groups:          │
│          'string'         │
│          #[14.0]          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      COMPARISON_JOIN      │
│    ────────────────────   │
│      Join Type: MARK      │
│                           ├──────────────┐
│        Conditions:        │              │
│    ('string' = #[8.0])    │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│       DUMMY_SCAN #0       ││       PROJECTION #8       │
│    ────────────────────   ││    ────────────────────   │
│                           ││   Expressions: 'string'   │
└───────────────────────────┘└─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │       DUMMY_SCAN #7       │
                             │    ────────────────────   │
                             └───────────────────────────┘
```
Mytherin added a commit that referenced this pull request Feb 18, 2025
We had two users crash with the following backtrace:

```
    frame #0: 0x0000ffffab2571ec
    frame #1: 0x0000aaaaac00c5fc duckling`duckdb::InternalException::InternalException(this=<unavailable>, msg=<unavailable>) at exception.cpp:328:2
    frame #2: 0x0000aaaaac1ee418 duckling`duckdb::optional_ptr<duckdb::OptimisticDataWriter, true>::CheckValid(this=<unavailable>) const at optional_ptr.hpp:34:11
    frame #3: 0x0000aaaaac1eea8c duckling`duckdb::MergeCollectionTask::Execute(duckdb::PhysicalBatchInsert const&, duckdb::ClientContext&, duckdb::GlobalSinkState&, duckdb::LocalSinkState&) [inlined] duckdb::optional_ptr<duckdb::OptimisticDataWriter, true>::operator*(this=<unavailable>) at optional_ptr.hpp:43:3
    frame #4: 0x0000aaaaac1eea84 duckling`duckdb::MergeCollectionTask::Execute(this=0x0000aaaaf1b06150, op=<unavailable>, context=0x0000aaaba820d8d0, gstate_p=0x0000aaab06880f00, lstate_p=<unavailable>) at physical_batch_insert.cpp:219:90
    frame #5: 0x0000aaaaac1d2e10 duckling`duckdb::PhysicalBatchInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const [inlined] duckdb::PhysicalBatchInsert::ExecuteTask(this=0x0000aaaafa62ab40, context=<unavailable>, gstate_p=0x0000aaab06880f00, lstate_p=0x0000aab12d442960) const at physical_batch_insert.cpp:425:8
    frame #6: 0x0000aaaaac1d2dd8 duckling`duckdb::PhysicalBatchInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const [inlined] duckdb::PhysicalBatchInsert::ExecuteTasks(this=0x0000aaaafa62ab40, context=<unavailable>, gstate_p=0x0000aaab06880f00, lstate_p=0x0000aab12d442960) const at physical_batch_insert.cpp:431:9
    frame #7: 0x0000aaaaac1d2dd8 duckling`duckdb::PhysicalBatchInsert::Sink(this=0x0000aaaafa62ab40, context=0x0000aab2fffd7cb0, chunk=<unavailable>, input=<unavailable>) const at physical_batch_insert.cpp:494:4
    frame #8: 0x0000aaaaac353158 duckling`duckdb::PipelineExecutor::ExecutePushInternal(duckdb::DataChunk&, duckdb::ExecutionBudget&, unsigned long) [inlined] duckdb::PipelineExecutor::Sink(this=0x0000aab2fffd7c00, chunk=0x0000aab2fffd7d30, input=0x0000fffec0aba8d8) at pipeline_executor.cpp:521:24
    frame #9: 0x0000aaaaac353130 duckling`duckdb::PipelineExecutor::ExecutePushInternal(this=0x0000aab2fffd7c00, input=0x0000aab2fffd7d30, chunk_budget=0x0000fffec0aba980, initial_idx=0) at pipeline_executor.cpp:332:23
    frame #10: 0x0000aaaaac34f7b4 duckling`duckdb::PipelineExecutor::Execute(this=0x0000aab2fffd7c00, max_chunks=<unavailable>) at pipeline_executor.cpp:201:13
    frame #11: 0x0000aaaaac34f258 duckling`duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) [inlined] duckdb::PipelineExecutor::Execute(this=<unavailable>) at pipeline_executor.cpp:278:9
    frame duckdb#12: 0x0000aaaaac34f250 duckling`duckdb::PipelineTask::ExecuteTask(this=0x0000aab16dafd630, mode=<unavailable>) at pipeline.cpp:51:33
    frame duckdb#13: 0x0000aaaaac348298 duckling`duckdb::ExecutorTask::Execute(this=0x0000aab16dafd630, mode=<unavailable>) at executor_task.cpp:49:11
    frame duckdb#14: 0x0000aaaaac356600 duckling`duckdb::TaskScheduler::ExecuteForever(this=0x0000aaaaf0105560, marker=0x0000aaaaf00ee578) at task_scheduler.cpp:189:32
    frame duckdb#15: 0x0000ffffab0a31fc
    frame duckdb#16: 0x0000ffffab2ad5c8
```

Core dump analysis showed that the assertion `D_ASSERT(lstate.writer);`
in `MergeCollectionTask::Execute` (i.e. it is crashing because
`lstate.writer` is NULLPTR) was not satisfied when
`PhysicalBatchInsert::Sink` was processing merge tasks from (other)
pipeline executors.

My suspicion is that this is only likely to happen for heavily
concurrent workloads (applicable to the two users which crashed). The
patch submitted as part of this PR has addressed the issue for these
users.
Mytherin pushed a commit that referenced this pull request Feb 26, 2025
Mytherin pushed a commit that referenced this pull request Aug 19, 2025
I have seen this crashing due to invalid pointer on which a destructor is called, on last night `main` (`2ed9bf887f`) using
unittester compiled from sources (clang 17) and extensions installed from default extension repository.

Basically:
```
DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/17/lib/darwin/libclang_rt.asan_osx_dynamic.dylib LOCAL_EXTENSION_REPO=http://extensions.duckdb.org ./build/release/test/unittest --autoloading all --skip-compiled  --order rand test/parquet/test_parquet_schema.test
```
and seeing runtime sanitizer assertions such as
```
==56046==ERROR: AddressSanitizer: container-overflow on address 0x6100000d4dcf at pc 0x000116c7f450 bp 0x00016fc1d170 sp 0x00016fc1d168
READ of size 1 at 0x6100000d4dcf thread T0
    #0 0x000116c7f44c in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>* std::__1::__uninitialized_allocator_copy_impl[abi:ne190102]<std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+0x318 (parquet.duckdb_extension:arm64+0xab44c)
    #1 0x000116c7ec90 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__construct_at_end<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, unsigned long)+0x160 (parquet.duckdb_extension:arm64+0xaac90)
    #2 0x000116c7e7d8 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__assign_with_size[abi:ne190102]<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, long)+0x1e0 (parquet.duckdb_extension:arm64+0xaa7d8)
    #3 0x000116e8cd48 in duckdb::ParquetMultiFileInfo::BindReader(duckdb::ClientContext&, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileBindData&)+0xf18 (parquet.duckdb_extension:arm64+0x2b8d48)
    #4 0x000116e6e5fc in duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::MultiFileBindInternal(duckdb::ClientContext&, duckdb::unique_ptr<duckdb::MultiFileReader, std::__1::default_delete<duckdb::MultiFileReader>, true>, duckdb::shared_ptr<duckdb::MultiFileList, true>, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileOptions, duckdb::unique_ptr<duckdb::BaseFileReaderOptions, std::__1::default_delete<duckdb::BaseFileReaderOptions>, true>, duckdb::unique_ptr<duckdb::MultiFileReaderInterface, std::__1::default_delete<duckdb::MultiFileReaderInterface>, true>)+0x1210 (parquet.duckdb_extension:arm64+0x29a5fc)
```

or these failures while using ducklake
```
==56079==ERROR: AddressSanitizer: container-overflow on address 0x616000091a78 at pc 0x0001323fc81c bp 0x00016bd0e890 sp 0x00016bd0e888
READ of size 8 at 0x616000091a78 thread T2049
    #0 0x0001323fc818 in duckdb::MultiFileColumnDefinition::~MultiFileColumnDefinition()+0x258 (parquet.duckdb_extension:arm64+0x2a4818)
    #1 0x0001323fb588 in std::__1::vector<duckdb::MultiFileColumnDefinition, std::__1::allocator<duckdb::MultiFileColumnDefinition>>::__destroy_vector::operator()[abi:ne190102]()+0x98 (parquet.duckdb_extension:arm64+0x2a3588)
    #2 0x0001324a09e4 in duckdb::BaseFileReader::~BaseFileReader()+0x2bc (parquet.duckdb_extension:arm64+0x3489e4)
    #3 0x0001324a23ec in duckdb::ParquetReader::~ParquetReader()+0x22c (parquet.duckdb_extension:arm64+0x34a3ec)
```
Mytherin pushed a commit that referenced this pull request Aug 19, 2025
I have seen this crashing due to invalid pointer on which a destructor
is called, on last night `main` (`2ed9bf887f`) using unittester compiled
from sources (clang 17) and extensions installed from default extension
repository.

Basically:
```
DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/17/lib/darwin/libclang_rt.asan_osx_dynamic.dylib LOCAL_EXTENSION_REPO=http://extensions.duckdb.org ./build/release/test/unittest --autoloading all --skip-compiled  --order rand test/parquet/test_parquet_schema.test
```
and seeing runtime sanitizer assertions such as
```
==56046==ERROR: AddressSanitizer: container-overflow on address 0x6100000d4dcf at pc 0x000116c7f450 bp 0x00016fc1d170 sp 0x00016fc1d168
READ of size 1 at 0x6100000d4dcf thread T0
    #0 0x000116c7f44c in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>* std::__1::__uninitialized_allocator_copy_impl[abi:ne190102]<std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+0x318 (parquet.duckdb_extension:arm64+0xab44c)
    #1 0x000116c7ec90 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__construct_at_end<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, unsigned long)+0x160 (parquet.duckdb_extension:arm64+0xaac90)
    #2 0x000116c7e7d8 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__assign_with_size[abi:ne190102]<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, long)+0x1e0 (parquet.duckdb_extension:arm64+0xaa7d8)
    #3 0x000116e8cd48 in duckdb::ParquetMultiFileInfo::BindReader(duckdb::ClientContext&, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileBindData&)+0xf18 (parquet.duckdb_extension:arm64+0x2b8d48)
    #4 0x000116e6e5fc in duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::MultiFileBindInternal(duckdb::ClientContext&, duckdb::unique_ptr<duckdb::MultiFileReader, std::__1::default_delete<duckdb::MultiFileReader>, true>, duckdb::shared_ptr<duckdb::MultiFileList, true>, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileOptions, duckdb::unique_ptr<duckdb::BaseFileReaderOptions, std::__1::default_delete<duckdb::BaseFileReaderOptions>, true>, duckdb::unique_ptr<duckdb::MultiFileReaderInterface, std::__1::default_delete<duckdb::MultiFileReaderInterface>, true>)+0x1210 (parquet.duckdb_extension:arm64+0x29a5fc)
```

or these failures while using ducklake
```
==56079==ERROR: AddressSanitizer: container-overflow on address 0x616000091a78 at pc 0x0001323fc81c bp 0x00016bd0e890 sp 0x00016bd0e888
READ of size 8 at 0x616000091a78 thread T2049
    #0 0x0001323fc818 in duckdb::MultiFileColumnDefinition::~MultiFileColumnDefinition()+0x258 (parquet.duckdb_extension:arm64+0x2a4818)
    #1 0x0001323fb588 in std::__1::vector<duckdb::MultiFileColumnDefinition, std::__1::allocator<duckdb::MultiFileColumnDefinition>>::__destroy_vector::operator()[abi:ne190102]()+0x98 (parquet.duckdb_extension:arm64+0x2a3588)
    #2 0x0001324a09e4 in duckdb::BaseFileReader::~BaseFileReader()+0x2bc (parquet.duckdb_extension:arm64+0x3489e4)
    #3 0x0001324a23ec in duckdb::ParquetReader::~ParquetReader()+0x22c (parquet.duckdb_extension:arm64+0x34a3ec)
```

With these changes, once having the `parquet` extension build by CI,
this works as expected.

I am not sure if the fix could / should be elsewhere.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants