-
Notifications
You must be signed in to change notification settings - Fork 14
add complete (library) support for sha-512 #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add complete (library) support for sha-512 #105
Conversation
d78faeb
to
15fb4f6
Compare
7b047a3
to
f52b64a
Compare
Some information about this change: for each of composefs-setup-root and cfsctl I tried building them with SHA-256 support (status quo), SHA-512 support (as per this PR currently) and then a hacked up version with support for both. For For
so there it's basically no difference at all. ~4.5k increase, ~0.2%. For
which is an increase of some ~275k or ~4%. That's generics for you.... The size of I'd probably play this out by doing something like adding a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably also consider avoiding some of the runtime generics and doing dynamic dispatch too...
There's already a dynamic type here https://docs.rs/oci-spec/latest/oci_spec/image/struct.Digest.html in the depchain even.
On the ostree side personally I ended up feeling it was just cleaner to pass around values as hex strings instead of binary actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways looks sane to me but I obviously only skimmed things
I might slowly be coming around to the "just pass strings around" idea except that it involves a lot of extra allocations... |
One thing worth thinking about, though: the different hash types aren't just different lengths of string, but encode some extra information like the name of the algorithm, the digest implementation to use for that algorithm, and the fs-verity digest algorithm number... not that this would be a huge problem to overcome, mind you... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few fixups here but on second reading I'm pretty happy with it, to be honest.
We might indeed to go for a "hex string" sort of approach in the future, or at least some more dynamic sort of thing with a boxed enum, but the code changes in this commit (particularly the untangling of the different uses of SHA-256) will make it a whole lot easier for us to approach that in the future... we could even try a FsVerityHashValue implementation as a string, with some adjustments...
Passing |
Presumably we're talking about strings here with prefixes like In fact, I could imagine a format of storing a Anyway. That's definitely future work, if we do it at all... |
Also worth mentioning, the way this works now, we end up with a lot of hash values in memory stored inline inside of our LeafContent structs. Those structs are quite big because of this, and we pay that price for non-regular/external leaf files: internal files, chardevs, blockdevs, fifos, sockets, symlinks. But all of those things (with the exception of symlinks) are exceedingly rare in container images.... and for symlinks maybe we could optimize small inline links or something shrug. By all of this I mean: switching to boxes would be less efficient at runtime. Brings me to another point, though: we need to adjust (double) our "max inline file size" when we're in SHA-512 mode... |
There are a few cases left over from before I learned how to do this properly. Fix them. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
src/fsverity.rs defines Sha256HashValue as an implementation of FsVerityHashValue but we also use it for situations where we handle SHA-256 digests that aren't fs-verity hash values. This happens a lot in the OCI code (to refer to images and layers) and correspondingly in the splitstream code (which stores those things). Introduce a util::Sha256Digest type, returned by util::parse_sha256() and use it consistently for all of the cases that have nothing to do with fs-verity. parse_sha256() is still used in some places in the code for values that are intended as fs-verity digests. We'll start fixing that up in the following commits when we add some helpers directly on the types themselves. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Split FsVerityHashValue and its two implementations out to a separate file. Right now it's not a lot of code, but it's going to get substantially larger in the next commit. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Right now we define pub type Sha256HashValue = [u8; 32]; pub type Sha512HashValue = [u8; 64]; which is simple enough but also painful: we can implement our own traits for these types (and indeed, we do it for FsVerityHashValue) but we can't implement things like fmt::Debug because there's a default impl (which rather unhelpfully prints the hashes out as an array of bytes). Turn these into proper types using a trivial struct wrapper: pub struct Sha256HashValue([u8; 32]); pub struct Sha512HashValue([u8; 64]); which gives us a lot more control. We can start by implementing meaningful fmt::Debug traits. Doing this means that we implicitly lose our implementation of AsRef<[u8]> and AsMut<[u8]> which we've been mostly using in order to do various ad-hoc encoding to/from various forms of hex values all over the codebase. Move all of those ad-hoc forms of encoding into proper methods on the trait itself by adding functions like ::to_hex() and ::from_hex() which we can use instead. We also drop our last instances of using parse_sha256() for fs-verity digests. There are a couple of other places where we need the actual bytes, though: when reading/writing hash values to splitstreams and when reading/writing the metacopy xattr in erofs images. Deal with those by deriving the usual zerocopy FromBytes/IntoBytes traits for our new hash types and moving the code in the splitstream and erofs modules to directly include the data inside of the relevant structure types, mark those types as FromBytes/IntoBytes, and use the zerocopy APIs directly. In the case of the metacopy attr in the erofs code this provides an opportunity for a substantial cleanup in an otherwise scary part of the code. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
This tests out all of the various conversions we can do. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Various bits of code here predate my having learned what Self. We're about to throw a bunch of generic type arguments absolutely everywhere, so increase our use of Self first to help make that less painful. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Let's pass a string instead of a Sha256HashValue to write_entry(). This will save us a substantial amount of grief with generics in the next commit. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Convert all of the many places that we've been using Sha256HashValue for fs-verity hashes to be generic over FsVerityHashValue instead. Work the changes out across the codebase. Due to some missing features in Rust, this change (while fairly mechnical) is also pretty large. See for example: - it's not possible to omit type parameters from the declaration of a function when they are irrelevant to the body of the function - rust-lang/rfcs#424 Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
This syntax is a bit less verbose than the fully-spelled-out generic form. I'm surprised by how few sites this was possible for... Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
f52b64a
to
80b7c66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably look at this a bit more tomorrow, but wanted to post what I have thus far before I take off for the day. This on balance looks super good to me. Lots of code-motion things that makes me go cross-eyed so I might miss some stuff in there. Just one real concern so far with public API exposure.
@@ -83,14 +81,17 @@ pub(crate) async fn read_exactish_async( | |||
Ok(true) | |||
} | |||
|
|||
/// Parse a string containing a SHA256 digest in hexidecimal form into a Sha256HashValue. | |||
/// A utility type representing a SHA-256 digest in binary. | |||
pub type Sha256Digest = [u8; 32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a proper type instead of an alias, especially since it's exposed in the public API.
The somewhat-silly reasoning: I am selfish and my lsp with rust-analyzer (probably for others too not using insane emacs setups?) will just erase Sha256Digest
everywhere in type annotations and show [u8; 32]
in its place, which is annoying and loses context.
The more constructive reasoning: exposing it publicly as [u8; 32]
is going to limit flexibility in the future. Maybe you want to attach extra metadata to the digest. Or maybe you want to impl
some traits for Sha256Digest
. As is, you couldn't for example impl std::fmt::Display
for it.
It's mildly more annoying since you'd have to do something like Sha256Digest::from([0u8; 32])
and digest.bytes()
or digest.0
or similar all over the place, but it gives you a lot more flexibility in the future without needing to break semver.
Edit: As I read the whole PR some more, I see you did basically exactly this in fsverity::hashvalue::{Sha256HashValue,Sha512HashValue}
.
match digest.strip_prefix("sha256:") { | ||
Some(rest) => Ok(parse_sha256(rest)?), | ||
None => bail!("Manifest has non-sha256 digest"), | ||
} | ||
} | ||
|
||
type ContentAndVerity = (Sha256HashValue, Sha256HashValue); | ||
type ContentAndVerity<ObjectID> = (Sha256Digest, ObjectID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one isn't as big of a deal being a type alias since it's not public, but worth consideration still.
This rests upon a massive pile of generics, but it seems to be working.
With these changes the library is now fully agnostic to which hash algorithm we use. I've tested this and it successfully produces erofs images with the metacopy attributes correctly set for verifying the fs-verity validity of files using sha512. The examples also correctly boot with a sha512 in the
composefs=
cmdline.The changes to the bin/ commands to support this will come soon: we need to thoughtfully consider what we want to do here at a higher level....