Move attached databases from a CatalogSet to a dedicated map of shared pointers #18693

Mytherin · 2025-08-22T07:48:28Z

Previously, attached databases were tracked in a transactional manner using a CatalogSet tied to the system catalog. This would make all operations on them transactional in relation to the system catalog. When attaching, attached databases would only be visible to transactions after the attach has been committed. Similarly, detaching would not instantly detach the database, but would postpone the actual detach until all transactions on the system catalog were terminated.

This seems elegant but can cause issues / hard-to-understand behavior:

Databases would only be detached on cleanup
Since every transaction uses the system transaction, effectively any open transaction could defer the clean-up of a database
We can only run the final checkpoint (checkpoint on shutdown) after the database has fully detached/shut down, which would be deferred from the actual detach

In particular, this could cause issues when detaching a database file, followed by re-attaching the same database file. If the detach had not fully completed yet - we would allow the attach, but could then have a checkpoint running in the background while other operations were happening on the attached database file. This could lead to data corruption and similar errors as we would have multiple writers writing to the same database file.

This data corruption can be fixed differently by detecting this and throwing an error preventing the attach - but this would then create more confusion from a user perspective. The database would be detached, but not cleaned up yet in the background because other transactions are still active, even if those transactions are not using that specific database. As a result, it is hard to tell when a detach would actually happen and when we could re-attach the same database again.

Direct Detach

In order to solve these issues - this PR moves away from using a CatalogSet for attached databases. Instead, we use a map of shared pointers. When a database is referenced by a transaction, that transaction grabs a shared pointer to that database and stores it, preventing the database from being detached as long as that transaction is active.

When a detach happens, the database is removed from the set of databases in the database manager. At this point, it will be directly cleaned up if no other transaction is actively using that database. If another transaction is still using the database, the database will be cleaned up after that transaction is finished. As a result, detaching while other transactions are using the database is still safe.

This does mean that Detach is no longer transactional - i.e. a rollback can no longer roll back a detach. If a transaction tries to use a database after it has been detached by another transaction, that database can disappear even if it has existed in the past.

Attaching works slightly differently. It is also no longer transactional - so if a transaction attaches a database, other transactions can immediately start using that database. However, attaching can still be rolled back. Effectively if we attach and then roll back, we will detach the database during rollback.

… manage the databases

…ction

…abases that were attached in the transaction again

Tishj · 2025-08-22T07:57:58Z

we will detach the database during rollback.

Only for the transaction that started the attach right?
If another connection would start using the attached db before the original transaction has done their rollback, the database would remain "attached", because the other connection is using it still?

The question is purely informational, just trying to see if I understand this behavior correctly

taniabogatsch · 2025-08-22T08:11:25Z

the database would remain "attached", because the other connection is using it still?

only until that other connection / query releases the shared pointer, I think. So , in a very theoretical case, we could keep it alive by always having someone use it?

Mytherin · 2025-08-22T08:11:53Z

Detach is not transactional anymore after this change, it will be detached globally even if another transaction is using it. Subsequent attempts to refer to the catalog will fail, even if they succeeded previously within the same transaction, e.g.:

statement ok
SET immediate_transaction_mode=true;

statement ok con1
BEGIN

statement ok con2
BEGIN

statement ok con1
ATTACH ':memory:' AS newly_attached;

# succeeds, we can immediately refer to newly_attached
statement ok con2
CREATE TABLE newly_attached.db(i INTEGER);

statement ok con1
ROLLBACK

# fails, newly_attached has been detached - even though we used it previously in the same transaction by con2
statement error con2
CREATE TABLE newly_attached.db(i INTEGER);
----
Catalog Error: Schema with name newly_attached does not exist!

However, the actual detach / shutdown of the database will not happen until "con2" has finished its transaction - so in the above example "newly_attached_db" is still being kept alive / attached by the existence of "con2" as "con2" has outstanding changes made to that database. It can just not be referenced.

Mytherin · 2025-08-22T08:14:51Z

Old behavior:

Attach / Detach follow transactional semantics, we can only refer to databases that have been "committed". If we can refer to a database during a transaction, it will always be available later on in that same transaction.
Actual destruction of a database post-detach will happen when all transactions that were open prior to detaching have finished, even if they never actually used / referred to that database

New behavior:

Attach / Detach happen immediately, we can immediately refer to other databases that have been attached and detach immediately removes our access to the database that is detached.
Actual destruction of a database post-detach happens after all transactions that have explicitly used that database have finished.

Mytherin · 2025-08-22T08:20:08Z

For reference, in SQLite attaching / detaching is completely non-transactional:

sqlite> BEGIN;
sqlite> ATTACH ':memory:' as newly_attached;
sqlite> ROLLBACK;
sqlite> CREATE TABLE newly_attached.tbl(i INTEGER);
sqlite> INSERT INTO newly_attached.tbl VALUES (42);
sqlite> SELECT * FROM newly_attached.tbl;
┌────┐
│ i  │
├────┤
│ 42 │
└────┘

taniabogatsch

Hi - left some comments, mostly questions :)

taniabogatsch · 2025-08-22T08:18:49Z

test/sql/parallelism/interquery/concurrent_attach_detach.cpp

+			createLookupTbl(*conn, dbId);
+			break;
+		default:
+			throw runtime_error("invalid scenario");


nit: maybe this can be an internal error or so?

taniabogatsch · 2025-08-22T08:19:07Z

test/sql/parallelism/interquery/concurrent_attach_detach.cpp

+	// execQuery(initConn, "SET default_block_size = '16384'");
+	// execQuery(initConn, "SET storage_compatibility_version = 'v1.3.2'");


Do we want to add these back in? Or fully remove them?

taniabogatsch · 2025-08-22T08:20:34Z

test/sql/parallelism/interquery/concurrent_attach_detach.cpp

+}
+
+void workUnit(std::unique_ptr<Connection> conn) {
+	for (int i = 0; i < iterationCount; i++) {


Since this test will probably be useful for simulating more complex scenarios in the future: do we want to use the success atomic to stop iteration here? So that all threads early-out, if one fails?

taniabogatsch · 2025-08-22T08:22:51Z

test/sql/attach/attach_concurrent_detach_mix.test_slow

+Unique file handle conflict
+
+statement maybe
+CREATE TABLE attach_mix_${k}_${i} AS SELECT * FROM range(${k} * ${i}) t(i)


Is there a dot missing in this test? attach_mix_${k}._${i}?

Yes great point, this actually found some issues

test/sql/attach/attach_use_rollback.test

taniabogatsch · 2025-08-22T08:26:45Z

src/transaction/rollback_state.cpp

+	case UndoFlags::ATTACHED_DATABASE: {
+		auto db = Load<AttachedDatabase *>(data);
+		auto &db_manager = DatabaseManager::Get(db->GetDatabase());
+		db_manager.DetachInternal(db->name);
+		break;
+	}


What is the benefit of being able to rollback an ATTACH? Is it mostly error-prevention of failing a query/transaction but having a lingering attached database?

Yes, making sure we clean-up after errors. This is mostly relevant for single-statement ATTACH queries, e.g. when encountering an error in ATTACH '...' we will now always clean up the attached database (if the error happened post adding it to the database manager).

…he database manager after it is fully loaded

…o the referenced_databases

Move attached databases from a CatalogSet to a dedicated map of shared pointers (duckdb/duckdb#18693) Unplug python (in ossivalis) (duckdb/duckdb#18699) Using a different workflow to release the python package (duckdb/duckdb#18685) Add WAL test config run (duckdb/duckdb#18683) [CI] Temporarily skip triggering `R Package Windows (Extensions)` job (duckdb/duckdb#18628) Load pandas in import cache before binding (duckdb/duckdb#18658) Remove `PRAGMA enable_verification` in more tests (duckdb/duckdb#18670) Remove more `PRAGMA enable_verification` (duckdb/duckdb#18664) Remove `PRAGMA enable_verification` (duckdb/duckdb#18645)

Move attached databases from a CatalogSet to a dedicated map of shared pointers (duckdb/duckdb#18693) Unplug python (in ossivalis) (duckdb/duckdb#18699) Using a different workflow to release the python package (duckdb/duckdb#18685) Add WAL test config run (duckdb/duckdb#18683) [CI] Temporarily skip triggering `R Package Windows (Extensions)` job (duckdb/duckdb#18628) Load pandas in import cache before binding (duckdb/duckdb#18658) Remove `PRAGMA enable_verification` in more tests (duckdb/duckdb#18670) Remove more `PRAGMA enable_verification` (duckdb/duckdb#18664) Remove `PRAGMA enable_verification` (duckdb/duckdb#18645) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>

…nsaction (#18850) Follow-up fixes to #18693 * Make `ATTACH OR REPLACE` atomic * Keep a list of used databases in the MetaTransaction, so that if we use a database with a given name in a transaction, we can keep on using that database with that name

Mytherin added 17 commits August 21, 2025 09:57

Rework database manager to no longer use a CatalogSet but to directly…

b3e29b9

… manage the databases

Make rollback still detach databases that were attached by the transa…

1bfe953

…ction

Concurent ATTACH/DETACH test

b233d27

Move attached databases into undo buffer and make rollback affect dat…

e92377d

…abases that were attached in the transaction again

Add test for ATTACH, USE, ROLLBACK

3156b23

Generate enums

20f2d56

Add missing include

591baee

Skip test in force restart

0398675

Add patch for Postgres

279dce2

Concurrent attach/detach test

d21601b

Add missing include

66b7a28

Patch sqlsmith

4b119d3

Tidy fixes

011a1c5

Add MySQL patch

c35bf6f

Release database lock when fetching transaction

dafebfb

Add check for path conflict before reading magic bytes

5d2a170

Format fix

eed8632

taniabogatsch reviewed Aug 22, 2025

View reviewed changes

taniabogatsch added the Needs Documentation Use for issues or PRs that require changes in the documentation label Aug 22, 2025

duckdblabs-bot mentioned this pull request Aug 22, 2025

[duckdb/#18693] - Move attached databases from a CatalogSet to a dedicated map of shared pointers needs documentation duckdb/duckdb-web#5699

Open

Mytherin added 3 commits August 22, 2025 11:22

Fix test

a1ab290

Merge branch 'v1.3-ossivalis' into attachnottransactional

4eb15f5

Skip test in wal verification pass

89b53b4

duckdb-draftbot marked this pull request as draft August 22, 2025 09:25

This is no longer an internal error

1fe67c9

Mytherin marked this pull request as ready for review August 22, 2025 09:28

Split attach into attach and finalize attach - only add database to t…

cb56ddc

…he database manager after it is fully loaded

duckdb-draftbot marked this pull request as draft August 22, 2025 11:39

MetaTransaction::GetTransaction also needs to add the used database t…

70e7ee8

…o the referenced_databases

Mytherin marked this pull request as ready for review August 22, 2025 13:24

Mytherin merged commit df0a3de into duckdb:v1.3-ossivalis Aug 22, 2025
49 checks passed

github-actions bot mentioned this pull request Aug 22, 2025

vendor: Update vendored sources to duckdb/duckdb@df0a3de74429887333ec4af047e7aac2737e52d8 duckdb/duckdb-r#1402

Merged

Mytherin deleted the attachnottransactional branch September 3, 2025 07:05

Mytherin mentioned this pull request Sep 3, 2025

Make ATTACH OR REPLACE atomic, keep list of used databases in MetaTransaction #18850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move attached databases from a CatalogSet to a dedicated map of shared pointers #18693

Move attached databases from a CatalogSet to a dedicated map of shared pointers #18693

Uh oh!

Mytherin commented Aug 22, 2025

Uh oh!

Tishj commented Aug 22, 2025 •

edited

Loading

Uh oh!

taniabogatsch commented Aug 22, 2025

Uh oh!

Mytherin commented Aug 22, 2025

Uh oh!

Mytherin commented Aug 22, 2025

Uh oh!

Mytherin commented Aug 22, 2025

Uh oh!

taniabogatsch left a comment

Uh oh!

taniabogatsch Aug 22, 2025

Uh oh!

taniabogatsch Aug 22, 2025

Uh oh!

taniabogatsch Aug 22, 2025

Uh oh!

taniabogatsch Aug 22, 2025

Uh oh!

Mytherin Aug 22, 2025

Uh oh!

Uh oh!

taniabogatsch Aug 22, 2025

Uh oh!

Mytherin Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

		// execQuery(initConn, "SET default_block_size = '16384'");
		// execQuery(initConn, "SET storage_compatibility_version = 'v1.3.2'");

Move attached databases from a CatalogSet to a dedicated map of shared pointers #18693

Move attached databases from a CatalogSet to a dedicated map of shared pointers #18693

Uh oh!

Conversation

Mytherin commented Aug 22, 2025

Direct Detach

Uh oh!

Tishj commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taniabogatsch commented Aug 22, 2025

Uh oh!

Mytherin commented Aug 22, 2025

Uh oh!

Mytherin commented Aug 22, 2025

Uh oh!

Mytherin commented Aug 22, 2025

Uh oh!

taniabogatsch left a comment

Choose a reason for hiding this comment

Uh oh!

taniabogatsch Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

taniabogatsch Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

taniabogatsch Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

taniabogatsch Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Mytherin Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taniabogatsch Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Mytherin Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Tishj commented Aug 22, 2025 •

edited

Loading