ASD #30

carlopi · 2023-06-21T08:51:00Z

No description provided.

I was investigating the following crash where a checkpoint task had its underlying resources being destroyed while it was still running. The two interesting threads are the following: ``` thread #1, name = 'duckling', stop reason = signal SIGTRAP frame #0: 0x0000ffff91bb71ec frame #1: 0x0000aaaad73a38e8 duckling`duckdb::InternalException::InternalException(this=<unavailable>, msg=<unavailable>) at exception.cpp:336:2 frame #2: 0x0000aaaad786eb68 duckling`duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::operator*() const [inlined] duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::AssertNotNull(null=<unavailable>) at unique_ptr.hpp:25:10 frame #3: 0x0000aaaad786eaf4 duckling`duckdb::unique_ptr<duckdb::RowGroup, std::default_delete<duckdb::RowGroup>, true>::operator*(this=0x0000aaacbb73e008) const at unique_ptr.hpp:34:4 frame #4: 0x0000aaaad787abbc duckling`duckdb::CheckpointTask::ExecuteTask(this=0x0000aaabec92be60) at row_group_collection.cpp:732:21 frame #5: 0x0000aaaad7756ea4 duckling`duckdb::BaseExecutorTask::Execute(this=0x0000aaabec92be60, mode=<unavailable>) at task_executor.cpp:72:3 frame #6: 0x0000aaaad7757e28 duckling`duckdb::TaskScheduler::ExecuteForever(this=0x0000aaaafda30e10, marker=0x0000aaaafda164a8) at task_scheduler.cpp:189:32 frame #7: 0x0000ffff91a031fc frame #8: 0x0000ffff91c0d5c8 thread duckdb#120, stop reason = signal 0 frame #0: 0x0000ffff91c71c24 frame #1: 0x0000ffff91e1264c frame #2: 0x0000ffff91e01888 frame #3: 0x0000ffff91e018f8 frame #4: 0x0000ffff91e01c10 frame #5: 0x0000ffff91e05afc frame #6: 0x0000ffff91e05e70 frame #7: 0x0000aaaad784b63c duckling`duckdb::RowGroup::~RowGroup() [inlined] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(this=<unavailable>) at shared_ptr_base.h:184:10 frame #8: 0x0000aaaad784b5b4 duckling`duckdb::RowGroup::~RowGroup() [inlined] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count(this=<unavailable>) at shared_ptr_base.h:705:11 frame #9: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] std::__shared_ptr<duckdb::ColumnData, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr(this=<unavailable>) at shared_ptr_base.h:1154:31 frame #10: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] duckdb::shared_ptr<duckdb::ColumnData, true>::~shared_ptr(this=<unavailable>) at shared_ptr_ipp.hpp:115:24 frame #11: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>>(__pointer=<unavailable>) at stl_construct.h:151:19 frame #12: 0x0000aaaad784b5ac duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy_aux<false>::__destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*>(__first=<unavailable>, __last=<unavailable>) at stl_construct.h:163:6 frame #13: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*>(__first=<unavailable>, __last=<unavailable>) at stl_construct.h:195:7 frame #14: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] void std::_Destroy<duckdb::shared_ptr<duckdb::ColumnData, true>*, duckdb::shared_ptr<duckdb::ColumnData, true>>(__first=<unavailable>, __last=<unavailable>, (null)=<unavailable>) at alloc_traits.h:848:7 frame #15: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup() [inlined] std::vector<duckdb::shared_ptr<duckdb::ColumnData, true>, std::allocator<duckdb::shared_ptr<duckdb::ColumnData, true>>>::~vector(this=<unavailable>) at stl_vector.h:680:2 frame #16: 0x0000aaaad784b5a0 duckling`duckdb::RowGroup::~RowGroup(this=<unavailable>) at row_group.cpp:83:1 frame #17: 0x0000aaaad786ee18 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] std::default_delete<duckdb::RowGroup>::operator()(this=0x0000aaacbb73e1a8, __ptr=0x0000aaab75ae7860) const at unique_ptr.h:85:2 frame #18: 0x0000aaaad786ee10 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() at unique_ptr.h:361:4 frame #19: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] duckdb::SegmentNode<duckdb::RowGroup>::~SegmentNode(this=0x0000aaacbb73e1a0) at segment_tree.hpp:21:8 frame #20: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>>(__pointer=0x0000aaacbb73e1a0) at stl_construct.h:151:19 frame #21: 0x0000aaaad786ee08 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy_aux<false>::__destroy<duckdb::SegmentNode<duckdb::RowGroup>*>(__first=0x0000aaacbb73e1a0, __last=0x0000aaacbb751130) at stl_construct.h:163:6 frame #22: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>*>(__first=<unavailable>, __last=0x0000aaacbb751130) at stl_construct.h:195:7 frame #23: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector() [inlined] void std::_Destroy<duckdb::SegmentNode<duckdb::RowGroup>*, duckdb::SegmentNode<duckdb::RowGroup>>(__first=<unavailable>, __last=0x0000aaacbb751130, (null)=0x0000fffefc81a908) at alloc_traits.h:848:7 frame #24: 0x0000aaaad786ede8 duckling`std::vector<duckdb::SegmentNode<duckdb::RowGroup>, std::allocator<duckdb::SegmentNode<duckdb::RowGroup>>>::~vector(this=size=4883) at stl_vector.h:680:2 frame #25: 0x0000aaaad7857f74 duckling`duckdb::RowGroupCollection::Checkpoint(this=<unavailable>, writer=<unavailable>, global_stats=0x0000fffefc81a9c0) at row_group_collection.cpp:1017:1 frame #26: 0x0000aaaad788f02c duckling`duckdb::DataTable::Checkpoint(this=0x0000aaab04649e70, writer=0x0000aaab6584ea80, serializer=0x0000fffefc81ab38) at data_table.cpp:1427:14 frame #27: 0x0000aaaad7881394 duckling`duckdb::SingleFileCheckpointWriter::WriteTable(this=0x0000fffefc81b128, table=0x0000aaab023b78c0, serializer=0x0000fffefc81ab38) at checkpoint_manager.cpp:528:11 frame #28: 0x0000aaaad787ece4 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] duckdb::SingleFileCheckpointWriter::CreateCheckpoint(this=<unavailable>, obj=0x0000fffefc81ab38)::$_7::operator()(duckdb::Serializer::List&, unsigned long) const::'lambda'(duckdb::Serializer&)::operator()(duckdb::Serializer&) const at checkpoint_manager.cpp:181:43 frame #29: 0x0000aaaad787ecd8 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] void duckdb::Serializer::List::WriteObject<duckdb::SingleFileCheckpointWriter::CreateCheckpoint()::$_7::operator()(duckdb::Serializer::List&, unsigned long) const::'lambda'(duckdb::Serializer&)>(this=<unavailable>, f=(unnamed class) @ 0x0000600002cbd2b0) at serializer.hpp:385:2 frame #30: 0x0000aaaad787ecc4 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() [inlined] duckdb::SingleFileCheckpointWriter::CreateCheckpoint()::$_7::operator()(this=<unavailable>, list=<unavailable>, i=2) const at checkpoint_manager.cpp:181:8 frame #31: 0x0000aaaad787ecb0 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint() at serializer.hpp:151:4 frame #32: 0x0000aaaad787ec94 duckling`duckdb::SingleFileCheckpointWriter::CreateCheckpoint(this=0x0000fffefc81b128) at checkpoint_manager.cpp:179:13 frame #33: 0x0000aaaad78954a8 duckling`duckdb::SingleFileStorageManager::CreateCheckpoint(this=0x0000aaaafe1de140, options=(wal_action = DONT_DELETE_WAL, action = CHECKPOINT_IF_REQUIRED, type = FULL_CHECKPOINT)) at storage_manager.cpp:365:17 frame #34: 0x0000aaaad78baac0 duckling`duckdb::DuckTransactionManager::Checkpoint(this=0x0000aaaafe167e00, context=<unavailable>, force=<unavailable>) at duck_transaction_manager.cpp:198:18 frame #35: 0x0000aaaad69d02c0 duckling`md::Server::BackgroundCheckpointIfNeeded(this=0x0000aaaafdbfe900) at server.cpp:1983:31 frame #36: 0x0000aaaadac5d3f0 duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::function<void ()>::operator()(this=<unavailable>) const at std_function.h:590:9 frame #37: 0x0000aaaadac5d3e0 duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] md::BackgroundCronTask::Start(unsigned long)::$_0::operator()(this=0x0000aaaafdf169a8) const at background_cron_task.cpp:25:4 frame #38: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] void std::__invoke_impl<void, md::BackgroundCronTask::Start(unsigned long)::$_0>((null)=<unavailable>, __f=0x0000aaaafdf169a8) at invoke.h:61:14 frame #39: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::__invoke_result<md::BackgroundCronTask::Start(unsigned long)::$_0>::type std::__invoke<md::BackgroundCronTask::Start(unsigned long)::$_0>(__fn=0x0000aaaafdf169a8) at invoke.h:96:14 frame #40: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] void std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>::_M_invoke<0ul>(this=0x0000aaaafdf169a8, (null)=<unavailable>) at std_thread.h:259:13 frame #41: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run() [inlined] std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>::operator()(this=0x0000aaaafdf169a8) at std_thread.h:266:11 frame #42: 0x0000aaaadac5d30c duckling`std::thread::_State_impl<std::thread::_Invoker<std::tuple<md::BackgroundCronTask::Start(unsigned long)::$_0>>>::_M_run(this=0x0000aaaafdf169a0) at std_thread.h:211:13 frame #43: 0x0000ffff91a031fc frame #44: 0x0000ffff91c0d5c8 ``` The problem is that if there's an IO exception being thrown in `RowGroupCollection::Checkpoint` after some (but not all) checkpoint tasks have been scheduled but before `checkpoint_state.executor.WorkOnTasks();` is called, it results in an InternalException / DuckDB crash as the `Checkpoint ` method does not wait for the scheduled tasks to have completed before destroying the referenced resources.

…groups, and limiting vacuum operations for the last number of row groups (duckdb#18829) When performing a checkpoint, we rewrite columns of row groups that have had modifications. When performing insertions, all columns are modified, thus all columns of a row group must be rewritten. As a result, any insertion followed by a checkpoint triggers a full rewrite of the last row group. When dealing with wide tables or string columns with large strings, rewriting a row group can be relatively expensive. This is particularly the case when only small insertions have been made. For example, inserting a single row followed by checkpointing can become expensive as a result of triggering a rewrite of the last row group: ```sql create table z as select repeat(sha256(i::varchar), 10) as a, repeat(sha256((i**2)::varchar), 10) as b, repeat(sha256((i**3)::varchar), 10) as c from range(100000) r(i); CHECKPOINT; .timer on insert into z values ('a','b','c'); Run Time (s): real 0.006 user 0.004249 sys 0.001186 CHECKPOINT; Run Time (s): real 0.611 user 1.163009 sys 0.054814 ``` The checkpoint takes 0.6s, despite us inserting only a single row. #### Appending a new row group This PR solves this problem by appending a new row group when writing to a table that has been persisted/checkpointed. As a result, we will no longer modify the (already checkpointed) data. As a trade-off, we end up with more (smaller) row groups. #### Vacuuming As part of vacuuming, we merge adjacent row groups if they fit within the row group size. As part of this PR - we restrict merging for the last few row groups. Otherwise, we would end up with the same problem. We would create row groups like so: ``` 100K rows 1 row ``` Then merge them back into a single row group with size `100K+1`, effectively performing the same rewrite as before. Instead, for the last row groups in a file, we need a *minimum threshold* to merge. This is either the `row group size` itself, or 2X the size of the first row group we are considering. We also always merge row groups if the total size is `< 2048` (the vector size). This exponential merging ensures we don't merge on every single insert, while also ensuring we never have too many row groups. We can see this process in action when starting with 100K rows, and inserting 100 rows at a time - effectively repeating the below script: ```sql insert into z select 'a','b','c' from range(100); checkpoint; select row_group_id, max(count) from pragma_storage_info('z') group by all order by all; ``` ```sql -- insert batch #1 ┌──────────────┬────────────┐ │ row_group_id │ max(count) │ │ int64 │ int64 │ ├──────────────┼────────────┤ │ 0 │ 100000 │ │ 1 │ 100 │ └──────────────┴────────────┘ -- insert batch #30 ┌──────────────┬────────────┐ │ row_group_id │ max(count) │ │ int64 │ int64 │ ├──────────────┼────────────┤ │ 0 │ 100000 │ │ 1 │ 2000 │ │ 2 │ 1000 │ └──────────────┴────────────┘ -- insert batch duckdb#150 ┌──────────────┬────────────┐ │ row_group_id │ max(count) │ │ int64 │ int64 │ ├──────────────┼────────────┤ │ 0 │ 100000 │ │ 1 │ 8000 │ │ 2 │ 4000 │ │ 3 │ 2000 │ │ 4 │ 1000 │ └──────────────┴────────────┘ -- insert batch duckdb#228 ┌──────────────┬────────────┐ │ row_group_id │ max(count) │ │ int64 │ int64 │ ├──────────────┼────────────┤ │ 0 │ 100000 │ │ 1 │ 16000 │ │ 2 │ 4000 │ │ 3 │ 2000 │ │ 4 │ 800 │ └──────────────┴────────────┘ -- insert batch duckdb#229 ┌──────────────┬────────────┐ │ row_group_id │ max(count) │ │ int64 │ int64 │ ├──────────────┼────────────┤ │ 0 │ 122800 │ │ 1 │ 100 │ └──────────────┴────────────┘ ``` ### Performance Running the above example again, the checkpoint now completes in `0.005` seconds, instead of `0.6` seconds. Note that we still do the compaction at some point, so while most checkpoints will have gotten faster, not all of them will have gotten faster. Running the above example 300X, here are the timings: ```sql create table z as select repeat(sha256(i::varchar), 10) as a, repeat(sha256((i**2)::varchar), 10) as b, repeat(sha256((i**3)::varchar), 10) as c from range(100000) r(i); -- 300X the below insertion insert into z select 'a','b','c' from range(100); CHECKPOINT; ``` | Summary | v1.3.2 | New | |-----------------------------|---------|--------| | Total Time | 128.82s | 1.00s | | Average Time Per Checkpoint | 0.56s | 0.01s | | Max Checkpoint Time | 0.631s | 0.526s | In the above example, there is still a single checkpoint that does the full on compaction, and thus has the same timing as the original script had for every checkpoint.

carlopi added 2 commits June 21, 2023 10:47

CI auto draft: pass token via environment

6251987

fix

5c94bc5

carlopi marked this pull request as draft June 21, 2023 08:51

carlopi marked this pull request as ready for review June 21, 2023 08:52

DRAFT ME

8856bf9

carlopi force-pushed the test3 branch from 66cf5c1 to 4492e79 Compare June 21, 2023 08:55

carlopi added 4 commits June 21, 2023 10:57

dr

50e2e59

ASD

e4f60ba

ASD

1aa98a0

ASD

18a1e28

carlopi force-pushed the test3 branch from 4492e79 to 18a1e28 Compare June 21, 2023 08:57

carlopi added 2 commits June 21, 2023 11:00

ASD

b2167f3

fix?

ec3b348

github-actions bot marked this pull request as draft June 21, 2023 09:08

ASD

4f64af9

carlopi marked this pull request as ready for review June 21, 2023 09:10

github-actions bot marked this pull request as draft June 21, 2023 09:10

carlopi marked this pull request as ready for review June 21, 2023 09:13

ADF

1ada979

github-actions bot marked this pull request as draft June 21, 2023 09:13

carlopi marked this pull request as ready for review June 21, 2023 09:14

github-actions bot marked this pull request as draft June 21, 2023 09:14

carlopi marked this pull request as ready for review June 21, 2023 09:15

github-actions bot marked this pull request as draft June 21, 2023 09:15

draftme

46642d8

carlopi marked this pull request as ready for review June 21, 2023 09:17

github-actions bot marked this pull request as draft June 21, 2023 09:17

ASDASD

6fae713

carlopi force-pushed the master branch from 50e2e59 to b261f06 Compare June 21, 2023 10:51

carlopi closed this Jun 21, 2023

carlopi deleted the test3 branch June 21, 2023 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ASD #30

ASD #30

Uh oh!

carlopi commented Jun 21, 2023

Uh oh!

Uh oh!

ASD #30

ASD #30

Uh oh!

Conversation

carlopi commented Jun 21, 2023

Uh oh!

Uh oh!