Skip to content

Conversation

ffuugoo
Copy link
Contributor

@ffuugoo ffuugoo commented Mar 19, 2025

This PR extends ShardReplicaSet::restore_local_replica_from to support partial snapshot recovery.

APIs to create and recover partial snapshots and integration tests will be added in a follow-up PRs.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@ffuugoo ffuugoo force-pushed the partial-snapshot-recover branch 2 times, most recently from 06f8a7f to f4dcec0 Compare March 24, 2025 14:57
@ffuugoo ffuugoo changed the title WIP: Implement partial snapshot recovery Implement partial snapshot recovery Mar 24, 2025
@ffuugoo ffuugoo marked this pull request as ready for review March 24, 2025 15:02
@ffuugoo ffuugoo requested review from timvisee and generall March 24, 2025 15:02

This comment was marked as resolved.

@ffuugoo
Copy link
Contributor Author

ffuugoo commented Mar 24, 2025

I've already implemented the APIs (and also merged take_snapshot and take_partial_snapshot methods, because they are very similar to each other), but I decided to split these changes into separate PRs, that I will open shortly.

coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@ffuugoo ffuugoo force-pushed the partial-snapshot-recover branch from 0bf5d2b to 4e38c00 Compare March 26, 2025 17:38
@ffuugoo ffuugoo requested a review from timvisee March 26, 2025 17:39
coderabbitai[bot]

This comment was marked as resolved.

@ffuugoo ffuugoo force-pushed the partial-snapshot-recover branch from 4e38c00 to 65ec3d9 Compare March 28, 2025 12:29
coderabbitai[bot]

This comment was marked as resolved.

@ffuugoo ffuugoo force-pushed the partial-snapshot-recover branch from 65ec3d9 to d55e366 Compare March 28, 2025 13:27
@ffuugoo ffuugoo force-pushed the partial-snapshot-recover branch from d55e366 to ed5f2e1 Compare March 31, 2025 15:56
pub fn validate(&self) -> OperationResult<()> {
for (segment_id, manifest) in &self.manifests {
for (file, version) in &manifest.file_versions {
if version.or_segment_version(manifest.segment_version) > manifest.segment_version {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see manifest.segment_version on both sides of the > here, is that on purpose?

Would it make sense to take an option instead and do .is_some_and(version < segment version)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, is_some_and would make more sense here, I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. It's not an Option, but a enum FileVersion { Version(SeqNumberType), Unversioned } and or_segment_version is basically Option::unwrap_or for Unversioned. I can add is_versioned_and method, but in this particular case version.or_segment_version(manifest.segment_version) > manifest.segment_version works the same.

Comment on lines +398 to +403
fn failed_to_add(what: &str, path: &Path, err: impl fmt::Display) -> OperationError {
OperationError::service_error(format!(
"failed to add {what} {} into snapshot: {err}",
path.display(),
))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is exclusively used in the function above, how do you feel about inlining it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never bother with explicit inlining, but sure.

@@ -82,6 +104,7 @@ impl ShardReplicaSet {
pub async fn restore_local_replica_from(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is humongous, spanning a few hundred lines. I think we should split it, but we can do so in a separate PR.

Copy link
Member

@KShivendu KShivendu Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Copy link
Member

@KShivendu KShivendu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looking forward to the tests and integration of this feature :)

Please split the restore_local_replica_from function.

@ffuugoo ffuugoo merged commit dce4a94 into dev Apr 2, 2025
17 checks passed
@ffuugoo ffuugoo deleted the partial-snapshot-recover branch April 2, 2025 13:11
@generall generall added this to the Incremental Snapshots milestone Apr 20, 2025
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
- Fix creating partial snapshot without RocksDB backup
- Fix `segment_manifest.json` file placement
- Implement partial snapshot recovery
- Allow restoring partial snapshot for all shard types
- Validate segment manifests when recovering partial snapshot
- Log error, if streaming shard/partial snapshot failed
- Contextualize errors in `snapshot_files` function
- Tweak log messages in `Segment::take_snapshot` method
- Remove unnecessary lifetimes from `SegmentHolder` iter methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants