Skip to content

Conversation

KShivendu
Copy link
Member

@KShivendu KShivendu commented Apr 1, 2025

Fixes the bug demonstrated in #6292

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

TODO:

  • Delete the shard initializing flag after initiating transfer to the dummy shard for recovery
  • Try to load shards even if they are dirty instead of directly recreating as empty shard - especially important when we have 0 replicas? Not desirable anymore.

@KShivendu KShivendu changed the title Heal dirty shards Recover dirty shards using other replicas Apr 1, 2025
@KShivendu KShivendu marked this pull request as draft April 1, 2025 16:02
@@ -681,7 +681,7 @@ impl Collection {
continue;
}

if this_peer_state != Some(Dead) || replica_set.is_dummy().await {
if !(this_peer_state == Some(Dead) || replica_set.is_dirty().await) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, we might always want to recover dummy shards here. In that case we can remove the dirty flag, and just use is_dummy(). If indeed possible, I'd prefer to remove it to keep our state simpler.

This might conflict with recovery mode though.

Let me have a bit of a thought about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might conflict with recovery mode though.

Exactly why I avoided it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably disable shard transfers all together when in recovery mode.

If we do that, we can simply check is_dummy() here without problems.

Copy link
Member Author

@KShivendu KShivendu Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recovery mode means is_dummy() => true, is_dirty() => false.
Dirty shard means is_dummy() => true, is_dirty() => true

disable shard transfers all together when in recovery mode

Okay but we must init a transfer when shard is dirty to fix it. That's why I'm using replica_set.is_dirty().await. not replica_set.is_dummy().await

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see checking a dirty flag as a stable way to see if recovery mode is enabled.

Instead we should explicitly check for recovery mode in the place where we drive transfers.

So I still prefer to:

  • just check is_dummy() here
  • explicitly check recovery mode

Comment on lines 422 to 441
if replica_set.is_dummy().await {
// Check if shard was dirty before init_empty_local_shard
let was_dirty = replica_set.is_dirty().await;
// TODO: If dirty, still try to load the shard and init empty shard only if it's not recoverable?
replica_set.init_empty_local_shard().await?;

if was_dirty {
let shard_flag = shard_initializing_flag_path(&collection_path, shard_id);
tokio::fs::remove_file(&shard_flag).await?;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should initialize an empty shard here as cannot do our recovery process on top of a corrupt shard, and so we must empty it here.

Suggested change
if replica_set.is_dummy().await {
// Check if shard was dirty before init_empty_local_shard
let was_dirty = replica_set.is_dirty().await;
// TODO: If dirty, still try to load the shard and init empty shard only if it's not recoverable?
replica_set.init_empty_local_shard().await?;
if was_dirty {
let shard_flag = shard_initializing_flag_path(&collection_path, shard_id);
tokio::fs::remove_file(&shard_flag).await?;
}
}
if replica_set.is_dummy().await {
// If shard was dirty, remove initializing flag after initializing empty
let was_dirty = replica_set.is_dirty().await;
replica_set.init_empty_local_shard().await?;
if was_dirty {
let shard_flag = shard_initializing_flag_path(&collection_path, shard_id);
tokio::fs::remove_file(&shard_flag).await?;
}
}

It's fine to empty it here, because at this point we're sure that there's some other replica that still has our data. We therefore have no data loss.

Copy link
Member Author

@KShivendu KShivendu Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because at this point we're sure that there's some other replica that still has our data. We therefore have no data loss

Because there was a shard initializing flag and hence there must be a source replica? Hmm, makes sense.

@KShivendu
Copy link
Member Author

KShivendu commented Apr 15, 2025

So I still prefer to:

  • just check is_dummy() here
  • explicitly check recovery mode

Okay. Dropped is_dirty bool from the DummyShard struct. But still keeping a is_dirty() in ShardReplicaSet since it keeps the code cleaner.

@KShivendu KShivendu requested a review from timvisee April 15, 2025 15:26
@timvisee
Copy link
Member

timvisee commented Apr 15, 2025

So I still prefer to:

  • just check is_dummy() here
  • explicitly check recovery mode

Okay. Dropped is_dirty bool from the DummyShard struct. But still keeping a is_dirty() in ShardReplicaSet since it keeps the code cleaner.

I don't agree. sync_local_state must never initiate a shard transfer if in recovery mode, and so we shall explicitly check it.

Then the dirty flag becomes obsolete.

The dirty flag magically acting different based on external state is not obvious and a potential footgun. I strongly prefer explicit checks here.

I must admit, I don't recall exactly what we agreed on in terms of what states to auto recover from. I'll have a proper thought about it over the night, as all the possible scenarios aren't super obvious.

@timvisee timvisee force-pushed the heal-dirty-shards branch from 7593f6c to 1a4ce81 Compare April 15, 2025 16:04
@timvisee timvisee requested a review from generall April 16, 2025 13:54
// We can reach here because of either of these:
// 1. Qdrant is in recovery mode, and user intentionally triggered a transfer
// 2. Shard is dirty (shard initializing flag), and Qdrant automatically triggered a transfer to recover dead state
// (note: initializing flag means there must be another replica)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK initializing flag does NOT mean, that there is another replica, but shard transfer does.

@KShivendu KShivendu changed the base branch from test-shard-initializing-behaviour to dev April 16, 2025 17:15
@KShivendu KShivendu changed the title Recover dirty shards using other replicas Recover dirty shards using other replicas when marked Dead Apr 16, 2025
@KShivendu KShivendu merged commit 8d49c2c into dev Apr 16, 2025
16 checks passed
@KShivendu KShivendu deleted the heal-dirty-shards branch April 16, 2025 19:52
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
* Test behaviour of Qdrant with shard initializing flag

* Corrupt shard directory and let Qdrant panic like prod

* Wait for shard transfer

* Restore dirty shards using other replicas

* remove unused code

* Request transfer only if replica is dead or dirty

* fmt

* remove comment

* fix clippy

* Delete shard initializing flag after initializing empty local shard

* Expect test to recover shard in existing test

* Review suggestions

* Run tests for longer

* Simplify tests

* Use 2k points

* condition for point_count

* Add comment

* fix flaky tests

* fix flaky tests

* handle edge case

* Include Active in expected states list

* Introduce is_recovery

* simplify tests

* get rid of is_dirty bool in DummyShard

* add missing negation in condition

* fix condition

* final fix for transfer condition

* Don't auto recover if in recovery mode, simplify state checking

* minor comment improvements

* tests scenario where node is killed after deleting shard initializing flag

* Fix failing CI

* Only automatically recover dead replicas

* Mark replica as dead to recover dummy shard

* fix failing test

* Sleep one second after killing peer, give time to release WAL lock

* Prevent waiting for peer to come online indefinitely

* update comment

* minor typo

---------

Co-authored-by: timvisee <tim@visee.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants