-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Recover dirty shards using other replicas when marked Dead #6293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/collection/src/collection/mod.rs
Outdated
@@ -681,7 +681,7 @@ impl Collection { | |||
continue; | |||
} | |||
|
|||
if this_peer_state != Some(Dead) || replica_set.is_dummy().await { | |||
if !(this_peer_state == Some(Dead) || replica_set.is_dirty().await) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, we might always want to recover dummy shards here. In that case we can remove the dirty flag, and just use is_dummy()
. If indeed possible, I'd prefer to remove it to keep our state simpler.
This might conflict with recovery mode though.
Let me have a bit of a thought about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might conflict with recovery mode though.
Exactly why I avoided it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably disable shard transfers all together when in recovery mode.
If we do that, we can simply check is_dummy()
here without problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recovery mode means is_dummy() => true, is_dirty() => false.
Dirty shard means is_dummy() => true, is_dirty() => true
disable shard transfers all together when in recovery mode
Okay but we must init a transfer when shard is dirty to fix it. That's why I'm using replica_set.is_dirty().await
. not replica_set.is_dummy().await
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see checking a dirty flag as a stable way to see if recovery mode is enabled.
Instead we should explicitly check for recovery mode in the place where we drive transfers.
So I still prefer to:
- just check
is_dummy()
here - explicitly check recovery mode
if replica_set.is_dummy().await { | ||
// Check if shard was dirty before init_empty_local_shard | ||
let was_dirty = replica_set.is_dirty().await; | ||
// TODO: If dirty, still try to load the shard and init empty shard only if it's not recoverable? | ||
replica_set.init_empty_local_shard().await?; | ||
|
||
if was_dirty { | ||
let shard_flag = shard_initializing_flag_path(&collection_path, shard_id); | ||
tokio::fs::remove_file(&shard_flag).await?; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should initialize an empty shard here as cannot do our recovery process on top of a corrupt shard, and so we must empty it here.
if replica_set.is_dummy().await { | |
// Check if shard was dirty before init_empty_local_shard | |
let was_dirty = replica_set.is_dirty().await; | |
// TODO: If dirty, still try to load the shard and init empty shard only if it's not recoverable? | |
replica_set.init_empty_local_shard().await?; | |
if was_dirty { | |
let shard_flag = shard_initializing_flag_path(&collection_path, shard_id); | |
tokio::fs::remove_file(&shard_flag).await?; | |
} | |
} | |
if replica_set.is_dummy().await { | |
// If shard was dirty, remove initializing flag after initializing empty | |
let was_dirty = replica_set.is_dirty().await; | |
replica_set.init_empty_local_shard().await?; | |
if was_dirty { | |
let shard_flag = shard_initializing_flag_path(&collection_path, shard_id); | |
tokio::fs::remove_file(&shard_flag).await?; | |
} | |
} |
It's fine to empty it here, because at this point we're sure that there's some other replica that still has our data. We therefore have no data loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because at this point we're sure that there's some other replica that still has our data. We therefore have no data loss
Because there was a shard initializing flag and hence there must be a source replica? Hmm, makes sense.
9b1c82b
to
c9a638f
Compare
b30c887
to
3d9c388
Compare
3b3fa4b
to
77823a6
Compare
Okay. Dropped |
I don't agree. Then the dirty flag becomes obsolete. The dirty flag magically acting different based on external state is not obvious and a potential footgun. I strongly prefer explicit checks here. I must admit, I don't recall exactly what we agreed on in terms of what states to auto recover from. I'll have a proper thought about it over the night, as all the possible scenarios aren't super obvious. |
7593f6c
to
1a4ce81
Compare
// We can reach here because of either of these: | ||
// 1. Qdrant is in recovery mode, and user intentionally triggered a transfer | ||
// 2. Shard is dirty (shard initializing flag), and Qdrant automatically triggered a transfer to recover dead state | ||
// (note: initializing flag means there must be another replica) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK initializing flag does NOT mean, that there is another replica, but shard transfer does.
* Test behaviour of Qdrant with shard initializing flag * Corrupt shard directory and let Qdrant panic like prod * Wait for shard transfer * Restore dirty shards using other replicas * remove unused code * Request transfer only if replica is dead or dirty * fmt * remove comment * fix clippy * Delete shard initializing flag after initializing empty local shard * Expect test to recover shard in existing test * Review suggestions * Run tests for longer * Simplify tests * Use 2k points * condition for point_count * Add comment * fix flaky tests * fix flaky tests * handle edge case * Include Active in expected states list * Introduce is_recovery * simplify tests * get rid of is_dirty bool in DummyShard * add missing negation in condition * fix condition * final fix for transfer condition * Don't auto recover if in recovery mode, simplify state checking * minor comment improvements * tests scenario where node is killed after deleting shard initializing flag * Fix failing CI * Only automatically recover dead replicas * Mark replica as dead to recover dummy shard * fix failing test * Sleep one second after killing peer, give time to release WAL lock * Prevent waiting for peer to come online indefinitely * update comment * minor typo --------- Co-authored-by: timvisee <tim@visee.me>
Fixes the bug demonstrated in #6292
All Submissions:
dev
branch. Did you create your branch fromdev
?TODO:
Try to load shards even if they are dirty instead of directly recreating as empty shard - especially important when we have 0 replicas?Not desirable anymore.