Fix checkpoint deadlock - re-use locks on the same table #13712
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes an issue where checkpoints together with concurrent queries that referenced the same table multiple times could lead to deadlocks.
The issue was as follows:
When doing a checkpoint of a table, we would grab an exclusive lock for a given table while writing the data for that table.
Grabbing an exclusive lock meant waiting for all existing readers to finish, while not allowing new readers to start. This allows existing readers to finish, while guaranteeing progress for the checkpoint thread (new readers cannot continuously block the checkpoint)
This causes problems when we have a query that refers to the same table multiple times - as we would try to grab the same lock multiple times within the same transaction. The deadlock would happen as follows:
T1
grabs a read lock ontable
table
, preventing new readers from startingT1
tries to grab another read lock ontable
This would then result in a deadlock, as
T1
would wait need to wait until the checkpoint thread finishes - but the checkpoint thread needs to wait untilT1
finishes.This PR resolves the issue by keeping track of all active locks grabbed by a transaction, and ensuring we never grab the same shared lock multiple times in the same transaction.