Skip to content

feat: add and port to SnapshotManager #3520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

mogery
Copy link

@mogery mogery commented Feb 4, 2024

Supersedes #3430 (part 1)

This PR centralises all snapshot file management code in a SnapshotManager struct, and ports everything to use it. This is needed to provide an abstraction for snapshot files, so that we can store them not only locally, but also on S3 (will be introduced in a future PR).

It also introduces a SnapshotFile struct which replaces PathBufs that were used to refer to snapshots before.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@mogery
Copy link
Author

mogery commented Feb 4, 2024

@generall This is -- in my opinion -- the smallest PR that makes sense and doesn't break build. It might still be too big, but it doesn't include the S3 code, so that's one less thing to test and review.

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mogery, thanks for PR! I think overall it is moving into the right direction, however I think it might benefit from making
SnapshotManager less global and more independent entity. Here is how I can see that:

  • SnapshotManager doesn't really have any internal state. It can be easily created from the configuration (even with s3 support enabled).
    All relevant structures in the code (ToC, Collection, Shard) have access to storage configuration, so it should be easy to build SnapshotManager on demand.

  • Current SnapshotManager knows too much about the whole project structure. E.g. it knows about the concept of shard_ids and collections. At the same time Collection should know about SnapshotManager which either create circular dependency or forces us to drop proper types at compile time (see the commend about u32 as shard is). I propose to make SnapshotManager more local, let it only work with single given "directory" and abstract it away from differenciation of the snapshot types.

Those two changes should allow making PR much more compact, avoid extra parameters in seemingly unrelated function, and also should allow to test Snapshot manager independently from the ToC and other storage structures.

@mogery
Copy link
Author

mogery commented Feb 5, 2024

Thank you for the review @generall!

I think overall it is moving into the right direction, however I think it might benefit from making SnapshotManager less global and more independent entity. Here is how I can see that:

  • SnapshotManager doesn't really have any internal state. It can be easily created from the configuration (even with s3 support enabled).

All relevant structures in the code (ToC, Collection, Shard) have access to storage configuration, so it should be easy to build SnapshotManager on demand.

Great catch! I didn't realize that but I totally agree.

  • Current SnapshotManager knows too much about the whole project structure. E.g. it knows about the concept of shard_ids and collections. At the same time Collection should know about SnapshotManager which either create circular dependency or forces us to drop proper types at compile time (see the commend about u32 as shard is). I propose to make SnapshotManager more local, let it only work with single given "directory" and abstract it away from differenciation of the snapshot types.

I think we can do it even better: in the S3 code to handle SnapshotFile::OutOfPlace correctly I added some code that determines if the path should point to S3 or not. All it did is check if the provided path is inside the configured snapshot path (or a subdirectory). I think we can extend this to all operations, remove SnapshotFile completely, and rely on paths entirely again, which would simplify things and reduce SnapshotManager's dependency on the current structure. Then the S3-specific logic would only kick in if S3 is enabled and the path should point to S3.

I'll make the changes this afternoon.

- not a global struct, instead it is easily instantiatable anytime
- snapshotfile has been deleted, snapshotmanagers work with paths and are scoped to single directories instead and unaware of project structure
@mogery
Copy link
Author

mogery commented Feb 7, 2024

@generall Made the following changes:

  • SnapshotFile has been deleted.
  • SnapshotManager has been de-globalized, instead it is instantiated on demand in many different places.
  • SnapshotManager is now scoped to a single directory with SnapshotManager::scope. It is unaware of the project structure, other code scopes it to certain directories to achieve certain functionality. (for example, to list all collection snapshots: toc.snapshot_manager().scope("my_collection/").do_list_snapshots().await?)
  • SnapshotManager functions that refer to exact snapshots (e.g. do_delete_snapshot) take an impl AsRef<Path> as the "snapshot name". This allows for the normal behaviour if a relative path like test_snapshot.snapshot is given, but also OutOfPlace behaviour is supported if an absolute path like /path/to/my/snapshot.snapshot is given. This allows for versatility while still centralizing snapshot access code. This can also be easily adapted to support S3.

I didn't have time to run any integration tests, so I'd appreciate a CI run. :)

@mogery mogery requested a review from generall February 8, 2024 12:40
@mogery
Copy link
Author

mogery commented Feb 11, 2024

Fixed the failing tests.

@generall
Copy link
Member

Hey @mogery, unfortunately the PR still have too big impact on the internals, which makes it very hard too review and validate.

There are many places where snapshot_manager is introduces as a new function argument, even though it doesn't provide any additional context to the function.

I propose to remove all of those places and try to keep PR at least within 1000 lines of code (ideally within 500 lines)

@generall
Copy link
Member

Closing in favor of #4150

@generall generall closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants