Skip to content

Conversation

Mytherin
Copy link
Collaborator

Follow-up from #15702
Supersedes/builds on top of #14981

This PR change the storage_compatibility_version from being a setting set on every session to be written in the database file.

Previously we would set this setting at run-time, and it would be shared across all database instances:

ATTACH 'file1.db';
-- write something, to be serialized targeting version v0.10.0
SET storage_compatibility_version = 'v1.0.0';
ATTACH 'file2.db';
-- write something, to be serialized targeting v1.0.0

This has a number of issues:

  • The storage compatibility version is shared across all attached databases
  • When restarting the system, the storage_compatibility_version would revert back towards the default setting (currently v0.10.0)
  • When reading a database, we did not know which storage compatibility version was used, which could lead to hard to understand errors when reading databases with an older version

STORAGE_VERSION parameter

This PR reworks this so that the storage version is instead specified on ATTACH. When none is specified:

  • The version set in the storage_compatibility_version is used when creating a new database
  • The version stored within the database is used when loading an existing database

As a result, we can target the storage version towards the desired supported version when creating a new database. When opening an existing database, we will keep on writing targeting the same DuckDB version (i.e. we never automatically "upgrade" the file to a newer DuckDB version). The user can manually upgrade a file by opening an older file while targeting a later storage version.

For example:

-- use default `storage_compatibility_version`
ATTACH 'new_file.db';
-- explicitly target versions >= v1.2.0
ATTACH 'new_file.db' (STORAGE_VERSION 'v1.2.0');

-- use the storage version stored within the file
ATTACH 'existing_file.db';
-- use storage version v1.2.0 - if the file uses an older storage version, this upgrades the file
ATTACH 'existing_file.db' (STORAGE_VERSION 'v1.2.0');

Note that we cannot downgrade a file. If we try to open a file that targets e.g. version v1.2.0 with an explicit storage version of v1.0.0, we get an error:

ATTACH 'database_file.db' (STORAGE_VERSION 'v1.2.0');
DETACH database_file;

ATTACH 'database_file.db' (STORAGE_VERSION 'v1.0.0');
-- Error opening "database_file.db": cannot initialize database with storage version 2 - which is lower than what the database itself uses (4). The storage version of an existing database cannot be lowered.

Opening with DuckDB < v1.1.3

When opening a file that targets v1.2.0 in an older DuckDB version, we now get a storage incompatibility error:

duckdb database_file.db
Error: unable to open database "database_file.db": IO Error: Trying to read a database file with version number 65, but we can only read version 64.
The database file was created with an newer version of DuckDB.

The storage of DuckDB is not yet stable; newer versions of DuckDB cannot read old database files and vice versa.
The storage will be stabilized when version 1.0 releases.

For now, we recommend that you load the database file in a supported version of DuckDB, and use the EXPORT DATABASE command followed by IMPORT DATABASE on the current version of DuckDB.

See the storage page for more information: https://duckdb.org/internals/storage

The description in the error is not entirely correct - but the error is a lot more descriptive than the previous error that would be thrown in this scenario (which was INTERNAL Error: Unsupported compression function type).

The error message has also been improved in #15702 already.

@Mytherin Mytherin requested a review from carlopi January 20, 2025 08:33
@duckdb-draftbot duckdb-draftbot marked this pull request as draft January 20, 2025 09:59
@Mytherin Mytherin marked this pull request as ready for review January 20, 2025 10:00
@duckdb-draftbot duckdb-draftbot marked this pull request as draft January 20, 2025 10:09
@Mytherin Mytherin marked this pull request as ready for review January 20, 2025 10:09
@duckdb-draftbot duckdb-draftbot marked this pull request as draft January 20, 2025 10:22
@Mytherin Mytherin marked this pull request as ready for review January 20, 2025 10:22
@Mytherin Mytherin merged commit c5b6b1a into duckdb:v1.2-histrionicus Jan 20, 2025
47 of 48 checks passed
@carlopi
Copy link
Contributor

carlopi commented Jan 20, 2025

Thanks! It was indeed ready to go in.

@Mytherin Mytherin deleted the serializationcompat branch February 3, 2025 13:00
@Tishj
Copy link
Contributor

Tishj commented Mar 5, 2025

Working on this related to #15637 I am extremely confused by some of the decisions made here.
If I recall correctly, we used the compatibility version as a way to not increase the storage version, because that would mean having to break backwards compatibility because with a version mismatch, anything would be up in the air.

I think of it kind of like the "patch" part of the semantic versioning, major.minor.patch
But looking at this now, it seems that what we serialize to the db as the "storage version" is in fact the compatibility version.

Can we discontinue the compatibility versioning entirely and just bump the next serialized storage version (the 'serialization' map in version_map.json) to 65?
It's confusing to me why we have both storage_version(64/65) and compatibility_version(4/5), but then write only the compatibility version to the db, and call it storage_version

Relevant PR for context on the serialization_compatibility: #12110

Mytherin added a commit that referenced this pull request Mar 6, 2025
…torage_version` (#16533)

This PR relates to #15794

With that PR, we introduced an upgrade to the previously established
`serialize_compatibility` flag.
The `storage_version` now lives in the db, and dictates which
compression algorithms are used.
This storage_version defaults to 0.10.2 for the sake of backwards
compatibility.

The ZSTD and Roaring compression algorithms were added as part of 1.2.0,
which is not covered by this default.
Because of this, the benchmarks for these compression algorithms were
silently using `Uncompressed`, as the forced compression algorithm is
not available.

This PR adds support for `assert` blocks in the benchmark runner, which
are similar to `result_query`, any number of `assert` blocks can be
added to verify the state of the db before running the benchmark.

`storage persistent` can currently be used to make the db used in the
benchmark a persistent one that does not live in-memory, so that it can
be checkpointed.
This PR introduces a second optional option to `storage`: `storage
persistent <version>` which is similar to the SQL version:
`attach 'db' (STORAGE VERSION <version>)`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants