check if we try to deactivate last initializing replica #6755

generall · 2025-06-24T22:26:41Z

There is a bug, which I can't reproduce locally, but it was observed on practice multiple times:

If pod was somehow killed during collection creation or there was an error during creating a collection (due to file descriptors or something like that), it might be possible that some shards of the collection have inconsistent state between initializing and dead.

Local shard thinks the shard is dead while other machines in the cluster consider it initializing.

Since local shard status it dead it needs to recover it from somewhere, but it is also the only shard in the cluster. So cluster is stuck in this inconsistent state without ability to recover (except for collection deletion).

This PR extends our check for is_last_active_replica and handles the case of no active replicas in more details.

timvisee

I did not reproduce this either, but I've seen this problem as well. The implementation looks sound 👍

lib/collection/src/shards/replica_set/mod.rs

Co-authored-by: Tim Visée <tim+github@visee.me>

* check if we try to deactivate last initializing replica * consider more cases * Update lib/collection/src/shards/replica_set/mod.rs Co-authored-by: Tim Visée <tim+github@visee.me> --------- Co-authored-by: Tim Visée <tim+github@visee.me>

generall added 2 commits June 25, 2025 00:05

check if we try to deactivate last initializing replica

d24e8c1

consider more cases

fe933ab

generall requested review from timvisee and ffuugoo June 24, 2025 22:26

This comment was marked as resolved.

Sign in to view

timvisee approved these changes Jun 25, 2025

View reviewed changes

lib/collection/src/shards/replica_set/mod.rs Outdated Show resolved Hide resolved

Update lib/collection/src/shards/replica_set/mod.rs

7e329ea

Co-authored-by: Tim Visée <tim+github@visee.me>

ffuugoo approved these changes Jun 25, 2025

View reviewed changes

generall merged commit c29ab98 into dev Jun 25, 2025
18 checks passed

generall deleted the do-not-deactivate-last-initializing branch June 25, 2025 12:04

coderabbitai bot mentioned this pull request Jul 2, 2025

Create new custom shards in Initializing state instead of Active #6778

Merged

9 tasks

generall mentioned this pull request Jul 10, 2025

qdrant node error on search, nodes in wrong state #6846

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

check if we try to deactivate last initializing replica #6755

check if we try to deactivate last initializing replica #6755

Uh oh!

generall commented Jun 24, 2025

Uh oh!

This comment was marked as resolved.

timvisee left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

check if we try to deactivate last initializing replica #6755

check if we try to deactivate last initializing replica #6755

Uh oh!

Conversation

generall commented Jun 24, 2025

Uh oh!

This comment was marked as resolved.

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!