forked from duckdb/duckdb
-
Notifications
You must be signed in to change notification settings - Fork 1
Shazam #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Shazam #2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Listalwayshasentry
…on build from make
Thanks! |
Mytherin
pushed a commit
that referenced
this pull request
Jan 9, 2024
commit 13fb9e2 Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 11:37:06 2023 -0800 PR cleanup #2 commit 066f3cc Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 11:21:07 2023 -0800 fix dereference nullptr commit 094db53 Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 10:43:15 2023 -0800 PR cleanup commit c9a1ecd Merge: 2893c0c 6258996 Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 10:22:20 2023 -0800 Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors commit 2893c0c Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 13:10:25 2023 +0100 make format fix. Get compiler ready commit 80b5f13 Merge: e30b726 c29eb0c Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 12:34:18 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit e30b726 Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 12:33:03 2023 +0100 remove all parallelism. will do it in the next iteration commit e8e088d Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 11:52:27 2023 +0100 still failing a test. Merging samples collected in parallel is difficult, and probably doesnt provide much benefit. Going to leave it for later commit 96bfa1c Merge: 45fa9a5 3237244 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 13 17:02:31 2023 +0100 Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors commit 45fa9a5 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 13 14:36:50 2023 +0100 make format-fix commit 049327b Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 13 14:31:22 2023 +0100 try to fix this parallel issue commit a5b290d Merge: 21d4120 8849f97 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Dec 12 11:18:52 2023 +0100 Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors commit 21d4120 Merge: 795c454 e117c34 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 11 12:43:43 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit c29eb0c Merge: 6bf31e1 25906f3 Author: Tmonster <tom@ebergen.com> Date: Thu Dec 7 15:23:06 2023 +0100 Merge remote-tracking branch 'upstream/main' commit 6bf31e1 Author: Elliana May <me@mause.me> Date: Mon Dec 4 22:21:30 2023 +0800 fix warning commit a521081 Author: Elliana May <me@mause.me> Date: Mon Dec 4 21:58:50 2023 +0800 add test for streaming extracted statements commit 5ee902a Author: Elliana May <me@mause.me> Date: Mon Dec 4 21:15:30 2023 +0800 add some tests of duckdb_execute_prepared_streaming commit 58b6664 Author: Elliana May <me@mause.me> Date: Mon Dec 4 21:02:48 2023 +0800 chore(docs): update docs for duckdb_execute_prepared_streaming commit a8e49b1 Author: Hannes Mühleisen <hannes@duckdblabs.com> Date: Tue Dec 5 11:31:21 2023 +0100 add test case, apparently from snowflake commit a7ee1dd Author: Hannes Mühleisen <hannes@duckdblabs.com> Date: Tue Dec 5 11:25:51 2023 +0100 enable implicit fallthrough warning for /src and fixed a few instances commit c6bf4c6 Author: Hannes Mühleisen <hannes@duckdblabs.com> Date: Tue Dec 5 11:02:54 2023 +0100 supporting more physical types of parquet time columns with time zone info commit baf670f Author: Jacob <535707+jkub@users.noreply.github.com> Date: Mon Dec 4 09:05:56 2023 -0800 make BufferPool members protected commit 878e7d2 Author: Yves <yves@motherduck.com> Date: Mon Dec 4 12:00:49 2023 -0500 Mark BufferPool getters const commit a7ddb87 Author: Gabor Szarnyas <gabor@duckdblabs.com> Date: Mon Dec 4 16:22:44 2023 +0100 Capitalize URL in httpfs extension flags commit 795c454 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 6 13:23:29 2023 +0100 removing reservoir type checks commit 6e0e431 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 6 11:25:50 2023 +0100 make format fix commit 236825b Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 6 10:23:57 2023 +0100 remove unused code commit 34902e9 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 21:20:09 2023 +0100 should pass make format fix commit 42d3fb8 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 18:04:36 2023 +0100 percentage is still global, but rows is local commit d378cc7 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 15:37:40 2023 +0100 some debugging statements commit 4ad877c Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 14:16:25 2023 +0100 some changes. Have a lot of bugs solved. but still not great commit ad79d30 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 4 17:41:37 2023 +0100 have figured out why percentage wasnt working. but it requires a big rework commit 04d4c0d Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 4 14:10:26 2023 +0100 reservoir sample works. but for large cardinalities and high percentages no commit ddcea54 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 4 12:33:16 2023 +0100 remove std::couts commit 4e12d15 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 17:47:16 2023 +0100 ok, have the proper output for reservoir sampling. need to understand when to add local sample or global sample commit 43e72a4 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 15:23:55 2023 +0100 compiles. Now I want to figure out where I left off last time commit 450655c Merge: 3639e4c 3f96a90 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 15:00:08 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit 3639e4c Merge: c10b3a4 5bc0773 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 14:56:39 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit c10b3a4 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Jan 23 10:04:20 2023 +0100 this should work now for sampling a set amount of rows. Still need to work on percentage sampling commit 7147e2a Author: Tom Ebergen <tom@ebergen.com> Date: Wed Jan 18 16:38:02 2023 +0100 it is starting to work, but need to look into why it is still slow commit 2255424 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Jan 18 11:21:15 2023 +0100 working for normal blocking sample, but not for percentage commit 8a01b32 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Jan 16 17:02:34 2023 +0100 intermediate commit, will fix other spots later commit 3676737 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Jan 16 09:03:52 2023 +0100 intermediate work, will be fixing later commit 904d220 Author: Tom Ebergen <tom@ebergen.com> Date: Fri Jan 13 13:50:43 2023 +0100 collecting samples in parallel now, now I need to figure out how to combine them in a proper uniform and weighted manner commit 01c4b89 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Jan 10 15:22:28 2023 +0100 minor code cleanup commit b5c6d61 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Jan 10 11:51:32 2023 +0100 get rid of 4 spaces commit 1dc807f Merge: 605520f 7e1a307 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Jan 10 11:50:14 2023 +0100 Merge branch 'reservoir_sampler_Vectors' of github.com:Tmonster/duckdb into reservoir_sampler_Vectors commit 7e1a307 Author: Tmonster <tom@ebergen.com> Date: Wed Jan 4 11:39:30 2023 -0800 make format-fix commit 750c1e3 Author: Tmonster <tom@ebergen.com> Date: Wed Jan 4 11:37:27 2023 -0800 small syntax updates commit fa2ac9c Author: Tmonster <tom@ebergen.com> Date: Wed Jan 4 11:36:33 2023 -0800 small syntax updates commit cd232c6 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 27 14:54:08 2022 -0800 Revert "mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not" This reverts commit 0f08574. commit 0f08574 Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 15:08:52 2022 -0800 mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not commit f4f5834 Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 13:43:00 2022 -0800 remove iostream commit bee57ae Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 12:37:02 2022 -0800 make format fix commit ce950a1 Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 09:10:23 2022 -0800 ok added test over reservoir threshold commit 29fb39a Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 09:01:21 2022 -0800 ok it's all in a datachunk, now I can try and parallelize it commit 8f05514 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 17:03:25 2022 +0100 remove pragma threads commit 98f9897 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 17:02:58 2022 +0100 no more memory errors commit da234b7 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 12:24:37 2022 +0100 no more errors when running count(*) on samples greater than the basic vector size commit 3fc7214 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 10:04:13 2022 +0100 fix error commit e302946 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 10:03:30 2022 +0100 still errors commit 7ea6405 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 19 21:15:38 2022 +0100 its getting better but still getting memory errors commit 8f3c597 Author: Tom Ebergen <tom@ebergen.com> Date: Fri Dec 16 16:48:12 2022 +0100 add some functionality, but mostly making reservoir sampler use datachunk chunkcollection commit 605520f Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 19 21:15:38 2022 +0100 its getting better but still getting memory errors commit 97a491e Author: Tom Ebergen <tom@ebergen.com> Date: Fri Dec 16 16:48:12 2022 +0100 add some functionality, but mostly making reservoir sampler use datachunk chunkcollection
Mytherin
added a commit
that referenced
this pull request
Oct 16, 2024
* Run formatter also on src/include/duckdb/core_functions/... * fix numpy issues with the 'string' dtype changes * Use numeric_limits * Format fix * Fix duckdb#12467 changes to stddev calculation * Format fix * Update min/max to cache allocations and prevent unnecessary re-allocation * missed some constants in FixedSizeBuffer * first step ci run for android * baby steps * typo * explicit platform * extension static build * more env and ninja * add arm64 * wrong flag * extensions, fingers crossed * container * using default containers * removing it in more places * patch vss * port changes of 'python_datetime_and_deltatime_missing_stride' from 'feature' to 'main' * Switch arg_min/arg_max to use sort key instead of vectors * Clean up unused functions * AddStringOrBlob * Skip only built-in optimizers * Add support for arg_min(ANY, ANY) * revert extension patch with optional_idx * Correct count * Format fix * Format fix * Switch fallback implementation of FIRST to use sort keys instead of vectors * WIP - clean up histogram function, avoid using ListVector::Push * Move many tests to slow * Add support for all types to the histogram function * dont WaitOnTask if there are no tasks available * Rework list_distinct/list_unique to support arbitrary types and to no longer use values and ListVector::Push * Format fix * fix compilation * Avoid overriding types in PrepareTypeForCast when not required * Use string_t and the arena allocator to allocate strings in the histogram function, instead of std::string * this is soooo much better * forgot to add new file * apply feedback * prevent the undefined behaviour of std::tolower() by casting the input to uint_8 * Binned histograms WIP * format * fix up test * More tests * Format fix + use string_map_t here * Detect duplicate bounds, sort bounds, and allow empty bounds * Binned histograms working for all types * Add binned histogram test to test all types * Unify/clean up histogram and binned histogram * RequiresExtract * Add TPC-H tests * Improve error message * Format * Add equi-width binning method for integers * More clean-up and testing, add support for doubles * lets start with this * Update .github/workflows/Android.yml Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com> * add missing headers * Make equi-width-bins always return the input type * treat NOTHING different from REPLACE/UPDATE, the filtered tuples should not be added to the returning chunk * format * nits, use ExtensionHelper to check for spatial extension * np.NaN -> np.nan * remove pyarrow version lock * cxxheaderparser * commit generated code * add reqs * Update CodeQuality.yml * name capi enums * rewrite verify_enum_integrity.py to use cxxheaderparser * fix dependency installation * Add equi-width binning support for dates/timestamps * update messages * remove dead code * format * Make typeof a constant function (removing its argument) so that the optimizer can optimize branches away * Format * Binning test * Revert "Make typeof a constant function (removing its argument) so that the optimizer can optimize branches away" This reverts commit 4455c46. * Fix test * Remove duplicate test * Re-generate functions * Use / * Add bind_expression function, and make typeof return a constant expression when possible * Set function pointer * Add function can_cast_implicitly that, given a source and target type, states whether or not the source can be implicitly cast to the target * This is optimized away * This is moved * change np.NaN -> np.nan * Fixes for equi_width_bins with timestamps - cap bins at bin_count, propagate fractional months/days downwards (i.e. 0.2 months becomes 6 days) and handle bin_count > time_diff in micros * Add histogram and histogram_values table macros + infrastructure for creating default table macros * Feature duckdb#1272: Window Executor State PR feedback. * More testing for histogram function * Allow min as boundary (when there are few values this might occur) * Format * Add missing include * Fix tests * Move mode function to owning string map * Format fix * Use correct map type in list aggregate for histogram * Make mode work for all types by using sort keys * Remove N from this test as there is a duplicate * "benchmark/micro/*" >> "benchmark/micro/*.benchmark" * add copy_to_select hook, rework unsupported types in copy_to and export * optimize * remove unused variable * Modify test to be deterministic * pass along options to select * nit * allow ducktyping of lists/dicts * add enum for copy to info * move type visit functions into TypeVisitor helper * When dropping a table - clear the local storage of any appended data to that table * Add include * Set flags again in test * Also call OnDropEntry for CREATE OR REPLACE * Add HTTP error code to extension install failures * Issue duckdb#12600: Streaming Positive LAG Use buffering to support streaming computation of constant positive LAGs and negative LEADs that are at most one vector away. This doesn't fix the "look ahead" problem, but the benchmark shows it is about 5x faster than the non-streaming version. * Issue duckdb#12600: Streaming Positive LAG Add new benchmark. * Feature duckdb#1272: Window Group Preparation Move the construction of the row data collections and masks to the Finalize phase. These are relatively fast and will use data that is still hot (e.g., the sort keys). This will make it easier parallelise the remaining two passes over the data (build and execute). * Feature duckdb#1272: Window Group Preparation Code cleanup and cast fixes. * Turn window_start and window_end into idx_t * Initialize payload collection DataChunk with payload_count to prevent resizing * Format * VectorOperations::Copy - fast path when copying an aligned flat validity mask into a flat vector * Set is_dropped flag instead of actually dropping data, as other scans might be depending on the local storage (in particular when running CREATE OR REPLACE tbl AS SELECT * FROM tbl) * Add missing include * Create sort key helper class and use it in min/max * Simplify histogram combine * StateCombine for binned histogram * Rework mode to also use sort key helpers * Remove specialized decimal implementation of minmax * Disable android CI on pushes/PRs * Quantile clean-up part 1: move bind data and sort tree to separate files * Move quantile window state into separate struct * Move MAD and quantile states/operations to separate files * Rework median - allow median(ANY) instead of having different function overloads * Avoid usage of std::string in quantiles, and switch to arena allocated strings as well * Add fallback option to discrete quantile * Issue duckdb#12600: Streaming Positive LAG Incorporate various PR feedback improvements: * Cached Executors * Validity Mask reset * Small Buffers Along with the mask copy improvement in VectorOperations::Copy these reduce the benchmark runtime by another 3x. * quantile_disc scalar and lists working for arbitrary types * Remove pre-allocation for now * Feature 1272: Window Payload Preallocation Only preallocate payloads for the value window functions (LEAD, LAG, FIRST, LAST, VALUE) instad of all of them. * Quantile binding clean-up, add test_all_types for quantile_disc and median * Test + CI fixes * Format * Clean up old deserialization code * Set destructor, remove some more dead code --------- Co-authored-by: Carlo Piovesan <piovesan.carlo@gmail.com> Co-authored-by: Tishj <t_b@live.nl> Co-authored-by: Mark Raasveldt <mark.raasveldt@gmail.com> Co-authored-by: taniabogatsch <44262898+taniabogatsch@users.noreply.github.com> Co-authored-by: Hannes Mühleisen <hannes@duckdblabs.com> Co-authored-by: Christina Sioula <chrisiou.myk@gmail.com> Co-authored-by: Hannes Mühleisen <hannes@muehleisen.org> Co-authored-by: Richard Wesley <13156216+hawkfish@users.noreply.github.com> Co-authored-by: Max Gabrielsson <max@gabrielsson.com> Co-authored-by: Elliana May <me@mause.me> Co-authored-by: Richard Wesley <hawkfish@electricfish.com> Co-authored-by: Maia <maia@duckdblabs.com>
Mytherin
added a commit
that referenced
this pull request
Dec 3, 2024
I was investigating the following crash where a checkpoint task had its underlying resources being destroyed while it was still running. The two interesting threads are the following: ``` thread #1, name = 'duckling', stop reason = signal SIGTRAP frame #0: 0x0000ffff91bb71ec frame #1: 0x0000aaaad73a38e8 duckling`duckdb::InternalException::InternalException(this=<unavailable>, msg=<unavailable>) at exception.cpp:336:2 frame #2: 0x0000aaaad786eb68 duckling`duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::operator*() const [inlined] duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::AssertNotNull(null=<unavailable>) at unique_ptr.hpp:25:10 frame #3: 0x0000aaaad786eaf4 duckling`duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::operator*(this=0x0000aaacbb73e008) const at unique_ptr.hpp:34:4 frame #4: 0x0000aaaad787abbc duckling`duckdb::CheckpointTask::ExecuteTask(this=0x0000aaabec92be60) at row_group_collection.cpp:732:21 frame #5: 0x0000aaaad7756ea4 duckling`duckdb::BaseExecutorTask::Execute(this=0x0000aaabec92be60, mode=<unavailable>) at task_executor.cpp:72:3 frame #6: 0x0000aaaad7757e28 duckling`duckdb::TaskScheduler::ExecuteForever(this=0x0000aaaafda30e10, marker=0x0000aaaafda164a8) at task_scheduler.cpp:189:32 frame #7: 0x0000ffff91a031fc frame #8: 0x0000ffff91c0d5c8 thread duckdb#120, stop reason = signal 0 frame #0: 0x0000ffff91c71c24 frame #1: 0x0000ffff91e1264c frame #2: 0x0000ffff91e01888 frame #3: 0x0000ffff91e018f8 frame #4: 0x0000ffff91e01c10 frame #5: 0x0000ffff91e05afc frame #6: 0x0000ffff91e05e70 frame #7: 0x0000aaaad784b63c duckling`duckdb::RowGroup::~RowGroup() [inlined] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(this=<unavailable>) at shared_ptr_base.h:184:10 frame #8: 0x0000aaaad784b5b4 duckling`duckdb::RowGroup::~RowGroup() [inlined] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count(this=<unavailable>) at shared_ptr_base.h:705:11 frame #9: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] std::__shared_ptr<duckdb::ColumnData, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr(this=<unavailable>) at shared_ptr_base.h:1154:31 frame #10: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] duckdb::shared_ptr<duckdb::ColumnData, true>::~shared_ptr(this=<unavailable>) at shared_ptr_ipp.hpp:115:24 frame #11: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>>(__pointer=<unavailable>) at stl_construct.h:151:19 frame duckdb#12: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy_aux<false>::__destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*>(__first=<unavailable>, __last=<unavailable>) at stl_construct.h:163:6 frame duckdb#13: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*>(__first=<unavailable>, __last=<unavailable>) at stl_construct.h:195:7 frame duckdb#14: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*, duckdb::shared_ptr<duckdb::ColumnData, true>>(__first=<unavailable>, __last=<unavailable>, (null)=<unavailable>) at alloc_traits.h:848:7 frame duckdb#15: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] std::vector<duckdb::shared_ptr<duckdb::ColumnData, true>, std::allocator<duckdb::shared_ptr<duckdb::ColumnData, true>>>::~vector(this=<unavailable>) at stl_vector.h:680:2 frame duckdb#16: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup(this=<unavailable>) at row_group.cpp:83:1 frame duckdb#17: 0x0000aaaad786ee18 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] std::default_delete<duckdb::RowGroup>::operator()(this=0x0000aaacbb73e1a8, __ptr=0x0000aaab75ae7860) const at unique_ptr.h:85:2 frame duckdb#18: 0x0000aaaad786ee10 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() at unique_ptr.h:361:4 frame duckdb#19: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] duckdb::SegmentNode<duckdb::RowGroup>::~SegmentNode(this=0x0000aaacbb73e1a0) at segment_tree.hpp:21:8 frame duckdb#20: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>>(__pointer=0x0000aaacbb73e1a0) at stl_construct.h:151:19 frame duckdb#21: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy_aux<false>::__destroy<duckdb::SegmentNode<duckdb::RowGroup>*>(__first=0x0000aaacbb73e1a0, __last=0x0000aaacbb751130) at stl_construct.h:163:6 frame duckdb#22: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>*>(__first=<unavailable>, __last=0x0000aaacbb751130) at stl_construct.h:195:7 frame duckdb#23: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>*, duckdb::SegmentNode<duckdb::RowGroup>>(__first=<unavailable>, __last=0x0000aaacbb751130, (null)=0x0000fffefc81a908) at alloc_traits.h:848:7 frame duckdb#24: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector(this=size=4883) at stl_vector.h:680:2 frame duckdb#25: 0x0000aaaad7857f74 duckling`duckdb::RowGroupCollection::Checkpoint(this=<unavailable>, writer=<unavailable>, global_stats=0x0000fffefc81a9c0) at row_group_collection.cpp:1017:1 frame duckdb#26: 0x0000aaaad788f02c duckling`duckdb::DataTable::Checkpoint(this=0x0000aaab04649e70, writer=0x0000aaab6584ea80, serializer=0x0000fffefc81ab38) at data_table.cpp:1427:14 frame duckdb#27: 0x0000aaaad7881394 duckling`duckdb::SingleFileCheckpointWriter::WriteTable(this=0x0000fffefc81b128, table=0x0000aaab023b78c0, serializer=0x0000fffefc81ab38) at checkpoint_manager.cpp:528:11 frame duckdb#28: 0x0000aaaad787ece4 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] duckdb::SingleFileCheckpointWriter::CreateCheckpoint(this=<unavailable>, obj=0x0000fffefc81ab38)::$_7::operator()(duckdb::Serializer::List&, unsigned long) const::'lambda'(duckdb::Serializer&)::operator()(duckdb::Serializer&) const at checkpoint_manager.cpp:181:43 frame duckdb#29: 0x0000aaaad787ecd8 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] void duckdb::Serializer::List::WriteObject<duckdb::SingleFileCheckpointWriter::CreateCheckpoint()::$_7::operator()(duckdb::Serializer::List&, unsigned long) const::'lambda'(duckdb::Serializer&)>(this=<unavailable>, f=(unnamed class) @ 0x0000600002cbd2b0) at serializer.hpp:385:2 frame duckdb#30: 0x0000aaaad787ecc4 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] duckdb::SingleFileCheckpointWriter::CreateCheckpoint()::$_7::operator()(this=<unavailable>, list=<unavailable>, i=2) const at checkpoint_manager.cpp:181:8 frame duckdb#31: 0x0000aaaad787ecb0 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() at serializer.hpp:151:4 frame duckdb#32: 0x0000aaaad787ec94 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint(this=0x0000fffefc81b128) at checkpoint_manager.cpp:179:13 frame duckdb#33: 0x0000aaaad78954a8 duckling`duckdb::SingleFileStorageManager::CreateCheckpoint(this=0x0000aaaafe1de140, options=(wal_action = DONT_DELETE_WAL, action = CHECKPOINT_IF_REQUIRED, type = FULL_CHECKPOINT)) at storage_manager.cpp:365:17 frame duckdb#34: 0x0000aaaad78baac0 duckling`duckdb::DuckTransactionManager::Checkpoint(this=0x0000aaaafe167e00, context=<unavailable>, force=<unavailable>) at duck_transaction_manager.cpp:198:18 frame duckdb#35: 0x0000aaaad69d02c0 duckling`md::Server::BackgroundCheckpointIfNeeded(this=0x0000aaaafdbfe900) at server.cpp:1983:31 frame duckdb#36: 0x0000aaaadac5d3f0 duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::function<void ()>::operator()(this=<unavailable>) const at std_function.h:590:9 frame duckdb#37: 0x0000aaaadac5d3e0 duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] md::BackgroundCronTask::Start(unsigned long)::$_0::operator()(this=0x0000aaaafdf169a8) const at background_cron_task.cpp:25:4 frame duckdb#38: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] void std::__invoke_impl<void, md::BackgroundCronTask::Start(unsigned long)::$_0>((null)=<unavailable>, __f=0x0000aaaafdf169a8) at invoke.h:61:14 frame duckdb#39: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::__invoke_result<md::BackgroundCronTask::Start(unsigned long)::$_0>::type std::__invoke<md::BackgroundCronTask::Start(unsigned long)::$_0>(__fn=0x0000aaaafdf169a8) at invoke.h:96:14 frame duckdb#40: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] void std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>::_M_invoke<0ul>(this=0x0000aaaafdf169a8, (null)=<unavailable>) at std_thread.h:259:13 frame duckdb#41: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>::operator()(this=0x0000aaaafdf169a8) at std_thread.h:266:11 frame duckdb#42: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run(this=0x0000aaaafdf169a0) at std_thread.h:211:13 frame duckdb#43: 0x0000ffff91a031fc frame duckdb#44: 0x0000ffff91c0d5c8 ``` The problem is that if there's an IO exception being thrown in `RowGroupCollection::Checkpoint` after some (but not all) checkpoint tasks have been scheduled but before `checkpoint_state.executor.WorkOnTasks();` is called, it results in an InternalException / DuckDB crash as the `Checkpoint ` method does not wait for the scheduled tasks to have completed before destroying the referenced resources.
Mytherin
added a commit
that referenced
this pull request
Jan 20, 2025
Fixes duckdblabs/duckdb-internal#3922 The failing query ```SQL SET order_by_non_integer_literal=true; SELECT DISTINCT ON ( 'string' ) 'string', GROUP BY CUBE ( 'string', ), 'string' IN ( SELECT 'string' ), HAVING 'string' IN ( SELECT 'string'); ``` The Plan generated before optimization is below. During optimization there is an attempt to convert the mark join into a semi. Before this conversion takes place, we usually check to make sure the mark join is not used in any operators above the mark join to prevent plan verification errors. Up until this point, only logical projections were checked for mark joins. Turns out this query is planned in such a way that the mark join is in one of the expressions of the aggregate operator. This was not checked, so the mark to semi conversion would take place. The fix is to modify the filter pushdown optimization so that it stores table indexes from logical aggregate operators. ``` ┌───────────────────────────┐ │ PROJECTION #1 │ │ ──────────────────── │ │ Expressions: #[2.0] │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ FILTER │ │ ──────────────────── │ │ Expressions: #[2.1] │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ AGGREGATE #2, #3, #4 │ │ ──────────────────── │ │ Groups: │ │ 'string' │ │ #[14.0] │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ COMPARISON_JOIN │ │ ──────────────────── │ │ Join Type: MARK │ │ ├──────────────┐ │ Conditions: │ │ │ ('string' = #[8.0]) │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ DUMMY_SCAN #0 ││ PROJECTION #8 │ │ ──────────────────── ││ ──────────────────── │ │ ││ Expressions: 'string' │ └───────────────────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ DUMMY_SCAN #7 │ │ ──────────────────── │ └───────────────────────────┘ ```
Mytherin
added a commit
that referenced
this pull request
Feb 18, 2025
We had two users crash with the following backtrace: ``` frame #0: 0x0000ffffab2571ec frame #1: 0x0000aaaaac00c5fc duckling`duckdb::InternalException::InternalException(this=<unavailable>, msg=<unavailable>) at exception.cpp:328:2 frame #2: 0x0000aaaaac1ee418 duckling`duckdb::optional_ptr<duckdb::OptimisticDataWriter, true>::CheckValid(this=<unavailable>) const at optional_ptr.hpp:34:11 frame #3: 0x0000aaaaac1eea8c duckling`duckdb::MergeCollectionTask::Execute(duckdb::PhysicalBatchInsert const&, duckdb::ClientContext&, duckdb::GlobalSinkState&, duckdb::LocalSinkState&) [inlined] duckdb::optional_ptr<duckdb::OptimisticDataWriter, true>::operator*(this=<unavailable>) at optional_ptr.hpp:43:3 frame #4: 0x0000aaaaac1eea84 duckling`duckdb::MergeCollectionTask::Execute(this=0x0000aaaaf1b06150, op=<unavailable>, context=0x0000aaaba820d8d0, gstate_p=0x0000aaab06880f00, lstate_p=<unavailable>) at physical_batch_insert.cpp:219:90 frame #5: 0x0000aaaaac1d2e10 duckling`duckdb::PhysicalBatchInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const [inlined] duckdb::PhysicalBatchInsert::ExecuteTask(this=0x0000aaaafa62ab40, context=<unavailable>, gstate_p=0x0000aaab06880f00, lstate_p=0x0000aab12d442960) const at physical_batch_insert.cpp:425:8 frame #6: 0x0000aaaaac1d2dd8 duckling`duckdb::PhysicalBatchInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const [inlined] duckdb::PhysicalBatchInsert::ExecuteTasks(this=0x0000aaaafa62ab40, context=<unavailable>, gstate_p=0x0000aaab06880f00, lstate_p=0x0000aab12d442960) const at physical_batch_insert.cpp:431:9 frame #7: 0x0000aaaaac1d2dd8 duckling`duckdb::PhysicalBatchInsert::Sink(this=0x0000aaaafa62ab40, context=0x0000aab2fffd7cb0, chunk=<unavailable>, input=<unavailable>) const at physical_batch_insert.cpp:494:4 frame #8: 0x0000aaaaac353158 duckling`duckdb::PipelineExecutor::ExecutePushInternal(duckdb::DataChunk&, duckdb::ExecutionBudget&, unsigned long) [inlined] duckdb::PipelineExecutor::Sink(this=0x0000aab2fffd7c00, chunk=0x0000aab2fffd7d30, input=0x0000fffec0aba8d8) at pipeline_executor.cpp:521:24 frame #9: 0x0000aaaaac353130 duckling`duckdb::PipelineExecutor::ExecutePushInternal(this=0x0000aab2fffd7c00, input=0x0000aab2fffd7d30, chunk_budget=0x0000fffec0aba980, initial_idx=0) at pipeline_executor.cpp:332:23 frame #10: 0x0000aaaaac34f7b4 duckling`duckdb::PipelineExecutor::Execute(this=0x0000aab2fffd7c00, max_chunks=<unavailable>) at pipeline_executor.cpp:201:13 frame #11: 0x0000aaaaac34f258 duckling`duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) [inlined] duckdb::PipelineExecutor::Execute(this=<unavailable>) at pipeline_executor.cpp:278:9 frame duckdb#12: 0x0000aaaaac34f250 duckling`duckdb::PipelineTask::ExecuteTask(this=0x0000aab16dafd630, mode=<unavailable>) at pipeline.cpp:51:33 frame duckdb#13: 0x0000aaaaac348298 duckling`duckdb::ExecutorTask::Execute(this=0x0000aab16dafd630, mode=<unavailable>) at executor_task.cpp:49:11 frame duckdb#14: 0x0000aaaaac356600 duckling`duckdb::TaskScheduler::ExecuteForever(this=0x0000aaaaf0105560, marker=0x0000aaaaf00ee578) at task_scheduler.cpp:189:32 frame duckdb#15: 0x0000ffffab0a31fc frame duckdb#16: 0x0000ffffab2ad5c8 ``` Core dump analysis showed that the assertion `D_ASSERT(lstate.writer);` in `MergeCollectionTask::Execute` (i.e. it is crashing because `lstate.writer` is NULLPTR) was not satisfied when `PhysicalBatchInsert::Sink` was processing merge tasks from (other) pipeline executors. My suspicion is that this is only likely to happen for heavily concurrent workloads (applicable to the two users which crashed). The patch submitted as part of this PR has addressed the issue for these users.
Mytherin
pushed a commit
that referenced
this pull request
Feb 26, 2025
Mytherin
pushed a commit
that referenced
this pull request
Aug 19, 2025
I have seen this crashing due to invalid pointer on which a destructor is called, on last night `main` (`2ed9bf887f`) using unittester compiled from sources (clang 17) and extensions installed from default extension repository. Basically: ``` DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/17/lib/darwin/libclang_rt.asan_osx_dynamic.dylib LOCAL_EXTENSION_REPO=http://extensions.duckdb.org ./build/release/test/unittest --autoloading all --skip-compiled --order rand test/parquet/test_parquet_schema.test ``` and seeing runtime sanitizer assertions such as ``` ==56046==ERROR: AddressSanitizer: container-overflow on address 0x6100000d4dcf at pc 0x000116c7f450 bp 0x00016fc1d170 sp 0x00016fc1d168 READ of size 1 at 0x6100000d4dcf thread T0 #0 0x000116c7f44c in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>* std::__1::__uninitialized_allocator_copy_impl[abi:ne190102]<std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+0x318 (parquet.duckdb_extension:arm64+0xab44c) #1 0x000116c7ec90 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__construct_at_end<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, unsigned long)+0x160 (parquet.duckdb_extension:arm64+0xaac90) #2 0x000116c7e7d8 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__assign_with_size[abi:ne190102]<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, long)+0x1e0 (parquet.duckdb_extension:arm64+0xaa7d8) #3 0x000116e8cd48 in duckdb::ParquetMultiFileInfo::BindReader(duckdb::ClientContext&, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileBindData&)+0xf18 (parquet.duckdb_extension:arm64+0x2b8d48) #4 0x000116e6e5fc in duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::MultiFileBindInternal(duckdb::ClientContext&, duckdb::unique_ptr<duckdb::MultiFileReader, std::__1::default_delete<duckdb::MultiFileReader>, true>, duckdb::shared_ptr<duckdb::MultiFileList, true>, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileOptions, duckdb::unique_ptr<duckdb::BaseFileReaderOptions, std::__1::default_delete<duckdb::BaseFileReaderOptions>, true>, duckdb::unique_ptr<duckdb::MultiFileReaderInterface, std::__1::default_delete<duckdb::MultiFileReaderInterface>, true>)+0x1210 (parquet.duckdb_extension:arm64+0x29a5fc) ``` or these failures while using ducklake ``` ==56079==ERROR: AddressSanitizer: container-overflow on address 0x616000091a78 at pc 0x0001323fc81c bp 0x00016bd0e890 sp 0x00016bd0e888 READ of size 8 at 0x616000091a78 thread T2049 #0 0x0001323fc818 in duckdb::MultiFileColumnDefinition::~MultiFileColumnDefinition()+0x258 (parquet.duckdb_extension:arm64+0x2a4818) #1 0x0001323fb588 in std::__1::vector<duckdb::MultiFileColumnDefinition, std::__1::allocator<duckdb::MultiFileColumnDefinition>>::__destroy_vector::operator()[abi:ne190102]()+0x98 (parquet.duckdb_extension:arm64+0x2a3588) #2 0x0001324a09e4 in duckdb::BaseFileReader::~BaseFileReader()+0x2bc (parquet.duckdb_extension:arm64+0x3489e4) #3 0x0001324a23ec in duckdb::ParquetReader::~ParquetReader()+0x22c (parquet.duckdb_extension:arm64+0x34a3ec) ```
Mytherin
pushed a commit
that referenced
this pull request
Aug 19, 2025
I have seen this crashing due to invalid pointer on which a destructor is called, on last night `main` (`2ed9bf887f`) using unittester compiled from sources (clang 17) and extensions installed from default extension repository. Basically: ``` DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/17/lib/darwin/libclang_rt.asan_osx_dynamic.dylib LOCAL_EXTENSION_REPO=http://extensions.duckdb.org ./build/release/test/unittest --autoloading all --skip-compiled --order rand test/parquet/test_parquet_schema.test ``` and seeing runtime sanitizer assertions such as ``` ==56046==ERROR: AddressSanitizer: container-overflow on address 0x6100000d4dcf at pc 0x000116c7f450 bp 0x00016fc1d170 sp 0x00016fc1d168 READ of size 1 at 0x6100000d4dcf thread T0 #0 0x000116c7f44c in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>* std::__1::__uninitialized_allocator_copy_impl[abi:ne190102]<std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+0x318 (parquet.duckdb_extension:arm64+0xab44c) #1 0x000116c7ec90 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__construct_at_end<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, unsigned long)+0x160 (parquet.duckdb_extension:arm64+0xaac90) #2 0x000116c7e7d8 in void std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__assign_with_size[abi:ne190102]<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, long)+0x1e0 (parquet.duckdb_extension:arm64+0xaa7d8) #3 0x000116e8cd48 in duckdb::ParquetMultiFileInfo::BindReader(duckdb::ClientContext&, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileBindData&)+0xf18 (parquet.duckdb_extension:arm64+0x2b8d48) #4 0x000116e6e5fc in duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::MultiFileBindInternal(duckdb::ClientContext&, duckdb::unique_ptr<duckdb::MultiFileReader, std::__1::default_delete<duckdb::MultiFileReader>, true>, duckdb::shared_ptr<duckdb::MultiFileList, true>, duckdb::vector<duckdb::LogicalType, true>&, duckdb::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, true>&, duckdb::MultiFileOptions, duckdb::unique_ptr<duckdb::BaseFileReaderOptions, std::__1::default_delete<duckdb::BaseFileReaderOptions>, true>, duckdb::unique_ptr<duckdb::MultiFileReaderInterface, std::__1::default_delete<duckdb::MultiFileReaderInterface>, true>)+0x1210 (parquet.duckdb_extension:arm64+0x29a5fc) ``` or these failures while using ducklake ``` ==56079==ERROR: AddressSanitizer: container-overflow on address 0x616000091a78 at pc 0x0001323fc81c bp 0x00016bd0e890 sp 0x00016bd0e888 READ of size 8 at 0x616000091a78 thread T2049 #0 0x0001323fc818 in duckdb::MultiFileColumnDefinition::~MultiFileColumnDefinition()+0x258 (parquet.duckdb_extension:arm64+0x2a4818) #1 0x0001323fb588 in std::__1::vector<duckdb::MultiFileColumnDefinition, std::__1::allocator<duckdb::MultiFileColumnDefinition>>::__destroy_vector::operator()[abi:ne190102]()+0x98 (parquet.duckdb_extension:arm64+0x2a3588) #2 0x0001324a09e4 in duckdb::BaseFileReader::~BaseFileReader()+0x2bc (parquet.duckdb_extension:arm64+0x3489e4) #3 0x0001324a23ec in duckdb::ParquetReader::~ParquetReader()+0x22c (parquet.duckdb_extension:arm64+0x34a3ec) ``` With these changes, once having the `parquet` extension build by CI, this works as expected. I am not sure if the fix could / should be elsewhere.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.