Skip to content

Conversation

Mytherin
Copy link
Collaborator

This PR enables DuckDB to optimistically compress and write data to disk when bulk loading.

Previously, when loading data from e.g. a large CSV or Parquet file in a single transaction, data was first loaded entirely into the transaction local storage. When committing, the data would then be flushed out of transaction local storage and written to the actual database file on disk.

While this has the advantage that in case of a ROLLBACK or similar error during loading discarding of the transaction-local state is free, it comes with a big disadvantage. Namely, when loading data that is bigger than memory we have to make multiple round-trips to disk.

Previously, this is what would happen when loading more data than fits in memory in a single transaction:

  • First we load into transaction-local storage - writing data that does not fit into memory to temporary files
  • Then we read from transaction-local storage, reading data back in from those temporary files
  • Finally we compress the data and write it to the database file

Optimistic Streaming to Disk

In this PR, we instead optimistically write transaction-local data to the database file as it is appended to tables. For every row group that is appended (120K~ rows) we immediately compress the data and write it out to the database file. This allows for a full streaming load into the database file, and greatly speeds up loading speed when loading more data than fits in memory in a single transaction.

If the transaction is rolled back or aborted, the blocks that were pre-emptively written to disk are marked as unused and reclaimed by the system for use in subsequent writes. This might still cause the database file to grow temporarily, however, and may create gaps in the database file if there are multiple transactions writing at the same time with a subset of those transactions aborting. That space is not lost - however. It will be re-used by the system when new data is ingested.

The actual performance gain depends mostly on the speed of the storage. On my Macbook (with a very fast SSD) the performance difference is small (on the order of 10%~ faster). On a machine with a hard disk or slower SSD, the performance difference will be far larger.

Another benefit is that required disk space drops heavily, as we will no longer have to write uncompressed data to disk. Instead, the data will be directly compressed as it is written to the table.

SQLLogicTest: concurrentloop

As part of this PR, we expand the sqllogictest framework with two new loops: concurrentloop and concurrentforeach. These operate similarly to the existing loop and foreach - with one big difference: every iteration of the loop is executed in parallel using separate threads that each have their own connection to the database. Example:

statement ok
CREATE TABLE integers(i INTEGER)

concurrentloop threadid 0 20

statement ok
INSERT INTO integers SELECT * FROM range(100);

endloop

query II
SELECT COUNT(*), SUM(i) FROM integers
----
2000	99000

We have several tests that test this type of behavior written in C++, but adding this functionality to the sqllogictest makes it significantly easier to write tests for the multiple-connection scenario, which should also enable us to write many more of them.

…ng tasks, otherwise if all tasks finish before the event is scheduled weird things happen
…ger store a pointer to a specific row group as it might become invalidated if we run e.g. an alter type
@Mytherin Mytherin merged commit 91ed0af into duckdb:master Oct 15, 2022
This was referenced Nov 10, 2022
@Mytherin Mytherin deleted the flushlocaltodisk2 branch January 7, 2023 14:58
sacundim pushed a commit to sacundim/covid-19-puerto-rico that referenced this pull request May 21, 2023
…vague hope that this will exploit some logic to better spill out to disk

duckdb/duckdb#4996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant