Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for parallel checkpointing of individual tables during the
CHECKPOINT
process. There are two types of tasks that are scheduled,VacuumTask
that merge 2+ row groups into fewer row groups (see #9931) andCheckpointTask
that take a row group and write the row group to disk. Tasks are executed in parallel for individual tables only (i.e. we checkpoint the row groups of a single table in parallel, but do not checkpoint multiple tables at the same time in parallel).In general row groups are independent from one another so operating on them in parallel is not a problem. The only cause of contention is the
PartialBlockManager
which colocates blocks on the same page, also across row groups. This PR adds fine-grained locking to the checkpointing process around the usage of thePartialBlockManager
for this reason.In addition, there was one gnarly change required to the checkpoint of the
StandardColumnData
- we need to checkpoint the validity after checkpointing the regular column. That is because when checkpointing the main data column we scan both the validity and the data itself for better compression. If the validity is already checkpointed this can cause a data race when run in parallel with thePartialBlockManager
, as a different thread can at any point flush a partial block that then triggers an update for every column that is stored on that partial block. If a validity column was flushed while it was being scanned this would cause problems.Performance
Running the same benchmarks in #9931 provides the following performance.
Note that we are still significantly slower than before - as we are rewriting half the table. In the future we could consider limiting the vacuuming that is performed during a checkpoint to e.g. ~20% of all row groups (configurable) which would limit the performance impact of the vacuuming even further.
Running the mix of deletes and additions performance is much closer to before while keeping all of the size and performance benefits of flushing the deletes: