-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Improve read request handling during partial-snapshot recovery #6725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/collection/src/shards/replica_set/execute_read_operation.rs
Outdated
Show resolved
Hide resolved
aba711b
to
cc013cd
Compare
@ffuugoo have started a docker image build https://github.com/qdrant/qdrant/actions/runs/15821075811 Should be deployed at |
@@ -122,8 +134,6 @@ impl ShardReplicaSet { | |||
))); | |||
}; | |||
|
|||
let _partial_snapshot_search_lock = self.try_take_search_read_lock()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check was useless/harmful.
Partial snapshot recovery takes local.write()
. If we already managed to take local.read()
during execute_local_read_operation
, there's no need to try_take_search_read_lock
. In worst case it will make us reject read request that we could have successfully performed.
None => (None, false, None), | ||
}; | ||
|
||
let local_is_active = self.peer_is_active(self.this_peer_id()); | ||
|
||
let local_operation = if local_is_active { | ||
let local_operation = async { | ||
let _partial_snapshot_search_lock = self.try_take_search_read_lock()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check was useless/harmful.
Similar to try_take_search_read_lock
above. If we already managed to take local.read()
, this check may force us to reject read request, that we could have successfully served.
880f94a
to
f6bda1f
Compare
/// Limits parallel *recover* partial snapshot requests. We are using `RwLock`, so that multiple | ||
/// read requests can check if recovery is in progress (by doing `try_read`) without blocking | ||
/// each other. | ||
recovery_lock: Arc<tokio::sync::RwLock<()>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now (sometimes) check if recovery_lock
is locked during execute_local_read_operation
, and so I've swapped Mutex
for RwLock
, so that concurrent read operations can do try_read
without blocking each other.
This comment was marked as resolved.
This comment was marked as resolved.
lib/collection/src/shards/replica_set/execute_read_operation.rs
Outdated
Show resolved
Hide resolved
e150f14
to
5d7ff74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This makes sense. I find the use of a watch channel to cancel a waiting read lock a bit convoluted, but I get why we're using it here. Another lock would not give us the same flexibility.
IIRC you mentioned in call this has the potential for a race, but I don't see a concrete problem.
lib/collection/src/shards/replica_set/execute_read_operation.rs
Outdated
Show resolved
Hide resolved
Just use a second lock 😔
Remove accidental `\` Co-authored-by: Tim Visée <tim+github@visee.me>
No more race, but I've made yet another update because a simple |
* Fix read request handling during partial snapshot recovery * fixup! Fix read request handling during partial snapshot recovery Just use a second lock 😔 * fixup! Fix read request handling during partial snapshot recovery Remove accidental `\` Co-authored-by: Tim Visée <tim+github@visee.me> --------- Co-authored-by: Tim Visée <tim+github@visee.me>
Get rid of
PartialSnapshotMeta::search_lock
reimplemented read request handling during partial recovery based onShardReplicaSet::local
andPartialSnapshotMeta::recovery_lock
.All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features: