Skip to content

Conversation

Mytherin
Copy link
Collaborator

@Mytherin Mytherin commented Dec 14, 2023

This PR adds support for parallel checkpointing of individual tables during the CHECKPOINT process. There are two types of tasks that are scheduled, VacuumTask that merge 2+ row groups into fewer row groups (see #9931) and CheckpointTask that take a row group and write the row group to disk. Tasks are executed in parallel for individual tables only (i.e. we checkpoint the row groups of a single table in parallel, but do not checkpoint multiple tables at the same time in parallel).

In general row groups are independent from one another so operating on them in parallel is not a problem. The only cause of contention is the PartialBlockManager which colocates blocks on the same page, also across row groups. This PR adds fine-grained locking to the checkpointing process around the usage of the PartialBlockManager for this reason.

In addition, there was one gnarly change required to the checkpoint of the StandardColumnData - we need to checkpoint the validity after checkpointing the regular column. That is because when checkpointing the main data column we scan both the validity and the data itself for better compression. If the validity is already checkpointed this can cause a data race when run in parallel with the PartialBlockManager, as a different thread can at any point flush a partial block that then triggers an update for every column that is stored on that partial block. If a validity column was flushed while it was being scanned this would cause problems.

Performance

Running the same benchmarks in #9931 provides the following performance.

DELETE FROM lineitem WHERE l_orderkey%2=0;
v0.9.2 Vacuum Parallel Checkpoint
0.12s 1.83s 0.44s

Note that we are still significantly slower than before - as we are rewriting half the table. In the future we could consider limiting the vacuuming that is performed during a checkpoint to e.g. ~20% of all row groups (configurable) which would limit the performance impact of the vacuuming even further.

Running the mix of deletes and additions performance is much closer to before while keeping all of the size and performance benefits of flushing the deletes:

CALL dbgen(sf=0);
COPY lineitem FROM 'lineitem.parquet';
DELETE FROM lineitem WHERE l_orderkey%2=0;
COPY lineitem FROM 'lineitem.parquet';
DELETE FROM lineitem WHERE l_orderkey%2=0;
COPY lineitem FROM 'lineitem.parquet';
DELETE FROM lineitem WHERE l_orderkey%2=0;
COPY lineitem FROM 'lineitem.parquet';
DELETE FROM lineitem WHERE l_orderkey%2=0;
COPY lineitem FROM 'lineitem.parquet';
DELETE FROM lineitem WHERE l_orderkey%2=0;
COPY lineitem FROM 'lineitem.parquet';
DELETE FROM lineitem WHERE l_orderkey%2=0;
- v0.9.2 Vacuum Parallel Checkpoint
Load 6.41s 25.18s 8.06s
Q01 (Cold) 0.172s 0.095s 0.089s
Q01 (Hot) 0.125s 0.091s 0.090s
Size 977MB 686MB 662MB

@Mytherin Mytherin merged commit bddc561 into duckdb:main Dec 15, 2023
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 15, 2023
Merge pull request duckdb/duckdb#10005 from Tishj/python3_7_optional_issue
Merge pull request duckdb/duckdb#9974 from taniabogatsch/duplicate-lambda-parameters
Merge pull request duckdb/duckdb#9999 from Mytherin/parallelcheckpoint
Merge pull request duckdb/duckdb#9973 from xuke-hat/refine-iejoin
Merge pull request duckdb/duckdb#9984 from hawkfish/tsns-time
Merge pull request duckdb/duckdb#9977 from atacan/swift-readme
@Mytherin Mytherin deleted the parallelcheckpoint branch February 14, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant