-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Move attached databases from a CatalogSet to a dedicated map of shared pointers #18693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move attached databases from a CatalogSet to a dedicated map of shared pointers #18693
Conversation
… manage the databases
…abases that were attached in the transaction again
Only for the transaction that started the attach right? The question is purely informational, just trying to see if I understand this behavior correctly |
only until that other connection / query releases the shared pointer, I think. So , in a very theoretical case, we could keep it alive by always having someone use it? |
Detach is not transactional anymore after this change, it will be detached globally even if another transaction is using it. Subsequent attempts to refer to the catalog will fail, even if they succeeded previously within the same transaction, e.g.: statement ok
SET immediate_transaction_mode=true;
statement ok con1
BEGIN
statement ok con2
BEGIN
statement ok con1
ATTACH ':memory:' AS newly_attached;
# succeeds, we can immediately refer to newly_attached
statement ok con2
CREATE TABLE newly_attached.db(i INTEGER);
statement ok con1
ROLLBACK
# fails, newly_attached has been detached - even though we used it previously in the same transaction by con2
statement error con2
CREATE TABLE newly_attached.db(i INTEGER);
----
Catalog Error: Schema with name newly_attached does not exist! However, the actual detach / shutdown of the database will not happen until "con2" has finished its transaction - so in the above example "newly_attached_db" is still being kept alive / attached by the existence of "con2" as "con2" has outstanding changes made to that database. It can just not be referenced. |
Old behavior:
New behavior:
|
For reference, in SQLite attaching / detaching is completely non-transactional: sqlite> BEGIN;
sqlite> ATTACH ':memory:' as newly_attached;
sqlite> ROLLBACK;
sqlite> CREATE TABLE newly_attached.tbl(i INTEGER);
sqlite> INSERT INTO newly_attached.tbl VALUES (42);
sqlite> SELECT * FROM newly_attached.tbl;
┌────┐
│ i │
├────┤
│ 42 │
└────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - left some comments, mostly questions :)
createLookupTbl(*conn, dbId); | ||
break; | ||
default: | ||
throw runtime_error("invalid scenario"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this can be an internal error or so?
// execQuery(initConn, "SET default_block_size = '16384'"); | ||
// execQuery(initConn, "SET storage_compatibility_version = 'v1.3.2'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add these back in? Or fully remove them?
} | ||
|
||
void workUnit(std::unique_ptr<Connection> conn) { | ||
for (int i = 0; i < iterationCount; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this test will probably be useful for simulating more complex scenarios in the future: do we want to use the success
atomic to stop iteration here? So that all threads early-out, if one fails?
Unique file handle conflict | ||
|
||
statement maybe | ||
CREATE TABLE attach_mix_${k}_${i} AS SELECT * FROM range(${k} * ${i}) t(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a dot missing in this test? attach_mix_${k}._${i}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes great point, this actually found some issues
case UndoFlags::ATTACHED_DATABASE: { | ||
auto db = Load<AttachedDatabase *>(data); | ||
auto &db_manager = DatabaseManager::Get(db->GetDatabase()); | ||
db_manager.DetachInternal(db->name); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of being able to rollback an ATTACH
? Is it mostly error-prevention of failing a query/transaction but having a lingering attached database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, making sure we clean-up after errors. This is mostly relevant for single-statement ATTACH
queries, e.g. when encountering an error in ATTACH '...'
we will now always clean up the attached database (if the error happened post adding it to the database manager).
…he database manager after it is fully loaded
…o the referenced_databases
Move attached databases from a CatalogSet to a dedicated map of shared pointers (duckdb/duckdb#18693) Unplug python (in ossivalis) (duckdb/duckdb#18699) Using a different workflow to release the python package (duckdb/duckdb#18685) Add WAL test config run (duckdb/duckdb#18683) [CI] Temporarily skip triggering `R Package Windows (Extensions)` job (duckdb/duckdb#18628) Load pandas in import cache before binding (duckdb/duckdb#18658) Remove `PRAGMA enable_verification` in more tests (duckdb/duckdb#18670) Remove more `PRAGMA enable_verification` (duckdb/duckdb#18664) Remove `PRAGMA enable_verification` (duckdb/duckdb#18645)
Move attached databases from a CatalogSet to a dedicated map of shared pointers (duckdb/duckdb#18693) Unplug python (in ossivalis) (duckdb/duckdb#18699) Using a different workflow to release the python package (duckdb/duckdb#18685) Add WAL test config run (duckdb/duckdb#18683) [CI] Temporarily skip triggering `R Package Windows (Extensions)` job (duckdb/duckdb#18628) Load pandas in import cache before binding (duckdb/duckdb#18658) Remove `PRAGMA enable_verification` in more tests (duckdb/duckdb#18670) Remove more `PRAGMA enable_verification` (duckdb/duckdb#18664) Remove `PRAGMA enable_verification` (duckdb/duckdb#18645) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Previously, attached databases were tracked in a transactional manner using a CatalogSet tied to the system catalog. This would make all operations on them transactional in relation to the system catalog. When attaching, attached databases would only be visible to transactions after the attach has been committed. Similarly, detaching would not instantly detach the database, but would postpone the actual detach until all transactions on the system catalog were terminated.
This seems elegant but can cause issues / hard-to-understand behavior:
In particular, this could cause issues when detaching a database file, followed by re-attaching the same database file. If the detach had not fully completed yet - we would allow the attach, but could then have a checkpoint running in the background while other operations were happening on the attached database file. This could lead to data corruption and similar errors as we would have multiple writers writing to the same database file.
This data corruption can be fixed differently by detecting this and throwing an error preventing the attach - but this would then create more confusion from a user perspective. The database would be detached, but not cleaned up yet in the background because other transactions are still active, even if those transactions are not using that specific database. As a result, it is hard to tell when a detach would actually happen and when we could re-attach the same database again.
Direct Detach
In order to solve these issues - this PR moves away from using a CatalogSet for attached databases. Instead, we use a map of shared pointers. When a database is referenced by a transaction, that transaction grabs a shared pointer to that database and stores it, preventing the database from being detached as long as that transaction is active.
When a detach happens, the database is removed from the set of databases in the database manager. At this point, it will be directly cleaned up if no other transaction is actively using that database. If another transaction is still using the database, the database will be cleaned up after that transaction is finished. As a result, detaching while other transactions are using the database is still safe.
This does mean that
Detach
is no longer transactional - i.e. a rollback can no longer roll back a detach. If a transaction tries to use a database after it has been detached by another transaction, that database can disappear even if it has existed in the past.Attaching works slightly differently. It is also no longer transactional - so if a transaction attaches a database, other transactions can immediately start using that database. However, attaching can still be rolled back. Effectively if we attach and then roll back, we will detach the database during rollback.