Skip to content

Allow directly writing snapshots on top of existing files #3002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

CompuIves
Copy link
Contributor

Reason for This PR

We're exploring diff snapshots, but found out that our underlying filesystem (btrfs) doesn't perform well with the rebase-snap tool due to performance issues with finding holes in files (related: #2948). Initially we tried using another syscall (fiemap), but that also was too slow (10s for 4GB). Because of this, we cannot apply diff snapshots quickly on top of a base snapshot.

Instead, we're now taking another approach where we directly save the diff snapshot on top of the base memory. Firecracker already supports this, because it only writes changed blocks. This has worked out well for us, and this might be interesting to merge into upstream.

There's no issue opened for this right now, and it might well be that this change cannot be merged into Firecracker, but if you're interested I'd be happy to make the required changes to make it fit for the repo!

Description of Changes

Instead of truncating the memory file we write to, we write on top of it.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The issue which led to this PR has a clear conclusion.
  • This PR follows the solution outlined in the related issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any newly added unsafe code is properly documented.
  • Any API changes follow the Runbook for Firecracker API changes.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

@acatangiu
Copy link
Contributor

acatangiu commented May 25, 2022

My 2c is this is a valid use-case that firecracker could support in addition to existing behavior of creating new snapshot containing just the diff bits.

There are two different use-cases here:

  1. (existing) the user is interested in the diff pages of memory, the diff snapshot - for example, multiple layers of "diffs" can be used in a distributed system to build snapshots at varying stages in the VM's lifetime, while minimizing the amount of "data" to distribute across the system.
  2. (what this PR proposes) the user is interested in a single snapshot instance, but wants to accelerate mutations to it - take snapshots as the guest workload makes progress (but only interested in latest snapshot), so instead of creating slow(er) full snapshots, directly create fast(er) diff snapshots on top of each other.

I think there's value in both use-cases, and the proposed changes in this PR seem to support both of them 👍

@CompuIves
Copy link
Contributor Author

I think there's value in both use-cases, and the proposed changes in this PR seem to support both of them

Yep, that's right. This would allow both for creating diff snapshots and saving snapshots directly on top of a base snapshot. The only breaking change that you could consider is that if people have been saving their snapshot on top of existing files, we don't truncate those files anymore, instead we write the changed pages on top of the file. That said, I don't expect that people are writing snapshots on top of existing files at the moment.

@CompuIves
Copy link
Contributor Author

It seems like the tests are failing on test coverage, should I change a value here?

@luminitavoicu
Copy link
Contributor

@CompuIves, thanks for your contribution! Your changes increase ARM coverage. Please rebase your PR and update the coverage value for the failing tests.

Copy link
Contributor

@luminitavoicu luminitavoicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the idea! Could you add an integration test for this, as well?

let mem_size_mib = mem_size_mib(vmm.guest_memory());
file.set_len((mem_size_mib * 1024 * 1024) as u64)
.map_err(|e| MemoryBackingFile("set_length", e))?;
let expected_size = (mem_size_mib * 1024 * 1024) as u64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily in scope of this PR, but I would like us to have a constant for 1024 * 1024.

@CompuIves CompuIves force-pushed the codesandbox/memory-save-snapshot branch from f3b8fa6 to 62db03e Compare June 15, 2022 12:53
@CompuIves
Copy link
Contributor Author

Rebased the PR, but I'm not sure what the code coverage is now with the changes that came from the rebase. Should I wait for a test run?

@CompuIves
Copy link
Contributor Author

Added integration test as well!

Copy link
Contributor

@luminitavoicu luminitavoicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please make sure that all of your commits are signed.
The coverage test is failing, you will have to update the numbers once again. Also take a look at the errors output from the python style test.

ssh_connection = net_tools.SSHConnection(microvm.ssh_config)

# Run command to do something
exit_code, _, _ = ssh_connection.execute_command("sync")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also run a workload inside the guest to dirty some pages.

Signed-off-by: Ives van Hoorne <ives@codesandbox.io>
@CompuIves CompuIves force-pushed the codesandbox/memory-save-snapshot branch from 94030d2 to 18c70af Compare June 21, 2022 10:02
@CompuIves
Copy link
Contributor Author

Addressed the feedback, squashed the commits, signed it and updated coverage! I think the tests should now pass, though I did see other perf tests fail, so that's something we should check.

@dianpopa dianpopa added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Aug 18, 2022
@dianpopa dianpopa requested a review from a team September 27, 2022 08:40
@dianpopa
Copy link
Contributor

Hi!

I do like the proposal! However, I am concerned about the possibility of a breaking change since I do expect that users of the rebase-snap tool to have the expectation to obtain diff layers containing dirty memory since last retrieval.
Shouldn't we let rebase-snap tool users know that from now in order to obtain a diff layer they need to make sure to remove any previous (name conflicting) snapshots?

@dianpopa dianpopa added Status: Blocked Indicates that an issue or pull request cannot currently be worked on and removed Status: Awaiting review Indicates that a pull request is ready to be reviewed labels Mar 16, 2023
@roypat
Copy link
Contributor

roypat commented Jul 25, 2023

Hi!

I do like the proposal! However, I am concerned about the possibility of a breaking change since I do expect that users of the rebase-snap tool to have the expectation to obtain diff layers containing dirty memory since last retrieval. Shouldn't we let rebase-snap tool users know that from now in order to obtain a diff layer they need to make sure to remove any previous (name conflicting) snapshots?

Mh, yes, for successive diff snapshots, this essentially does "rebase-snap" functionality at snapshot-taking-time, we might want to add a note about that somewhere. For full snapshots however, this "fixes" the scenario where currently if a full snapshot from which the microvm is loaded is being used as the target of a full/diff snapshot, it just gets zeroed out.

@roypat
Copy link
Contributor

roypat commented Dec 7, 2023

Hey @CompuIves, we have implemented this functionality in #4301 based on your proposal. I am thus closing this PR. Thank you again for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Blocked Indicates that an issue or pull request cannot currently be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants