Skip to content

Conversation

allisonkarlitskaya
Copy link
Collaborator

@allisonkarlitskaya allisonkarlitskaya commented Apr 15, 2025

This rests upon a massive pile of generics, but it seems to be working.

With these changes the library is now fully agnostic to which hash algorithm we use. I've tested this and it successfully produces erofs images with the metacopy attributes correctly set for verifying the fs-verity validity of files using sha512. The examples also correctly boot with a sha512 in the composefs= cmdline.

The changes to the bin/ commands to support this will come soon: we need to thoughtfully consider what we want to do here at a higher level....

  • remove the last two commits (which are here just for demo purposes)
  • decide what we want to do about sha256/sha512 or both: for now we stick with SHA-256 by default. edit: We can enable support for SHA-512 in the commandline tools in future commits.
  • consider that we're essentially producing an incompatible repository format (with differently named objects, images, different image content, different splitstream content, everything) and figure out how we want to handle that from a compat standpoint. edit: punt.

@allisonkarlitskaya
Copy link
Collaborator Author

allisonkarlitskaya commented Apr 16, 2025

Some information about this change: for each of composefs-setup-root and cfsctl I tried building them with SHA-256 support (status quo), SHA-512 support (as per this PR currently) and then a hacked up version with support for both. For cfsctl for example I renamed main() to cfsctl() and added a generic type parameter and instantiated it twice (once for sha256, once for sha512) from the "real" main(), just to get an idea.

For composefs-setup-root, it's just the fs-verity ioctls and mounting code involved, which gets us the following results in terms of binary size with --release:

sha256 = 1573712,
sha512 = 1573744
both = 1578272

so there it's basically no difference at all. ~4.5k increase, ~0.2%.

For cfsctl (which pulls in basically all of the code, including the heavily genericified in-memory filesystem image stuff) the numbers are more interesting:

sha256 = 6952272
sha512 = 6974072
both = 7228032

which is an increase of some ~275k or ~4%. That's generics for you....

The size of cfsctl doesn't really worry me too much, to be honest, and the size of composefs-setup-root is barely affected.

I'd probably play this out by doing something like adding a --hash-alg=[sha256|sha512] flag to cfsctl, paying the 4% tax, and then having composefs-setup-root expect to see the composefs= kernel cmdline arg prefixed with either sha256: or sha512:.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably also consider avoiding some of the runtime generics and doing dynamic dispatch too...

There's already a dynamic type here https://docs.rs/oci-spec/latest/oci_spec/image/struct.Digest.html in the depchain even.

On the ostree side personally I ended up feeling it was just cleaner to pass around values as hex strings instead of binary actually.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways looks sane to me but I obviously only skimmed things

@allisonkarlitskaya
Copy link
Collaborator Author

I might slowly be coming around to the "just pass strings around" idea except that it involves a lot of extra allocations...

@allisonkarlitskaya
Copy link
Collaborator Author

allisonkarlitskaya commented Apr 16, 2025

One thing worth thinking about, though: the different hash types aren't just different lengths of string, but encode some extra information like the name of the algorithm, the digest implementation to use for that algorithm, and the fs-verity digest algorithm number... not that this would be a huge problem to overcome, mind you...

Copy link
Collaborator Author

@allisonkarlitskaya allisonkarlitskaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few fixups here but on second reading I'm pretty happy with it, to be honest.

We might indeed to go for a "hex string" sort of approach in the future, or at least some more dynamic sort of thing with a boxed enum, but the code changes in this commit (particularly the untangling of the different uses of SHA-256) will make it a whole lot easier for us to approach that in the future... we could even try a FsVerityHashValue implementation as a string, with some adjustments...

@cgwalters
Copy link
Collaborator

I might slowly be coming around to the "just pass strings around" idea except that it involves a lot of extra allocations...

Passing String always involves a heap allocation yes but it's possible to stack allocate hex strings too though doing so via [char; 64] is obviously extra expensive and I think maybe we want an array of https://docs.rs/ascii/latest/ascii/ or so? There's a bit of a crate-interoperability thing here too, at one point ascii types were going to be in std, but they're not. In theory hex::encode should promise to output ascii (at a minimum).

@allisonkarlitskaya
Copy link
Collaborator Author

Presumably we're talking about strings here with prefixes like sha256: and sha512:? Because otherwise I'd just say we should use a Box<[u8]> instead... there's no benefit to having hex in there other than taking up twice as much space. Internally, nothing in this crate thinks in hex except for the encoding of the object filenames (and I still have some idea that it might be nicer to use base64 here): the metacopy attributes, encoding in splitstream, and the ioctl interface are all in binary, and it would definitely make sense to keep our storage also in binary.

In fact, I could imagine a format of storing a Box<[u8]> such that the first character is the kernel fs-verity algorithm ID (1=sha256, 2=sha512) and the rest of the bytes are the digest. We wouldn't need a size field of course (which makes us different from the struct the kernel uses for this) since Box<[u8]> is already a fat pointer...

Anyway. That's definitely future work, if we do it at all...

@allisonkarlitskaya
Copy link
Collaborator Author

Also worth mentioning, the way this works now, we end up with a lot of hash values in memory stored inline inside of our LeafContent structs. Those structs are quite big because of this, and we pay that price for non-regular/external leaf files: internal files, chardevs, blockdevs, fifos, sockets, symlinks. But all of those things (with the exception of symlinks) are exceedingly rare in container images.... and for symlinks maybe we could optimize small inline links or something shrug. By all of this I mean: switching to boxes would be less efficient at runtime.

Brings me to another point, though: we need to adjust (double) our "max inline file size" when we're in SHA-512 mode...

There are a few cases left over from before I learned how to do this
properly.  Fix them.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
src/fsverity.rs defines Sha256HashValue as an implementation of
FsVerityHashValue but we also use it for situations where we handle
SHA-256 digests that aren't fs-verity hash values.  This happens a lot
in the OCI code (to refer to images and layers) and correspondingly in
the splitstream code (which stores those things).

Introduce a util::Sha256Digest type, returned by util::parse_sha256()
and use it consistently for all of the cases that have nothing to do
with fs-verity.

parse_sha256() is still used in some places in the code for values that
are intended as fs-verity digests.  We'll start fixing that up in the
following commits when we add some helpers directly on the types
themselves.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Split FsVerityHashValue and its two implementations out to a separate
file.  Right now it's not a lot of code, but it's going to get
substantially larger in the next commit.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Right now we define

  pub type Sha256HashValue = [u8; 32];
  pub type Sha512HashValue = [u8; 64];

which is simple enough but also painful: we can implement our own traits
for these types (and indeed, we do it for FsVerityHashValue) but we
can't implement things like fmt::Debug because there's a default impl
(which rather unhelpfully prints the hashes out as an array of bytes).

Turn these into proper types using a trivial struct wrapper:

  pub struct Sha256HashValue([u8; 32]);
  pub struct Sha512HashValue([u8; 64]);

which gives us a lot more control.  We can start by implementing
meaningful fmt::Debug traits.

Doing this means that we implicitly lose our implementation of
AsRef<[u8]> and AsMut<[u8]> which we've been mostly using in order to do
various ad-hoc encoding to/from various forms of hex values all over the
codebase.  Move all of those ad-hoc forms of encoding into proper
methods on the trait itself by adding functions like ::to_hex() and
::from_hex() which we can use instead.  We also drop our last instances
of using parse_sha256() for fs-verity digests.

There are a couple of other places where we need the actual bytes,
though: when reading/writing hash values to splitstreams and when
reading/writing the metacopy xattr in erofs images.  Deal with those by
deriving the usual zerocopy FromBytes/IntoBytes traits for our new hash
types and moving the code in the splitstream and erofs modules to
directly include the data inside of the relevant structure types, mark
those types as FromBytes/IntoBytes, and use the zerocopy APIs directly.
In the case of the metacopy attr in the erofs code this provides an
opportunity for a substantial cleanup in an otherwise scary part of the
code.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
This tests out all of the various conversions we can do.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Various bits of code here predate my having learned what Self.  We're
about to throw a bunch of generic type arguments absolutely everywhere,
so increase our use of Self first to help make that less painful.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Let's pass a string instead of a Sha256HashValue to write_entry().  This
will save us a substantial amount of grief with generics in the next
commit.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Convert all of the many places that we've been using Sha256HashValue for
fs-verity hashes to be generic over FsVerityHashValue instead.  Work the
changes out across the codebase.

Due to some missing features in Rust, this change (while fairly
mechnical) is also pretty large.  See for example:
 - it's not possible to omit type parameters from the declaration of a
   function when they are irrelevant to the body of the function
 - rust-lang/rfcs#424

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
This syntax is a bit less verbose than the fully-spelled-out generic
form.  I'm surprised by how few sites this was possible for...

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
@allisonkarlitskaya allisonkarlitskaya marked this pull request as ready for review April 17, 2025 17:55
@allisonkarlitskaya allisonkarlitskaya changed the title add complete support for sha-512 add complete (library) support for sha-512 Apr 17, 2025
Copy link
Collaborator

@jeckersb jeckersb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably look at this a bit more tomorrow, but wanted to post what I have thus far before I take off for the day. This on balance looks super good to me. Lots of code-motion things that makes me go cross-eyed so I might miss some stuff in there. Just one real concern so far with public API exposure.

@@ -83,14 +81,17 @@ pub(crate) async fn read_exactish_async(
Ok(true)
}

/// Parse a string containing a SHA256 digest in hexidecimal form into a Sha256HashValue.
/// A utility type representing a SHA-256 digest in binary.
pub type Sha256Digest = [u8; 32];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a proper type instead of an alias, especially since it's exposed in the public API.

The somewhat-silly reasoning: I am selfish and my lsp with rust-analyzer (probably for others too not using insane emacs setups?) will just erase Sha256Digest everywhere in type annotations and show [u8; 32] in its place, which is annoying and loses context.

The more constructive reasoning: exposing it publicly as [u8; 32] is going to limit flexibility in the future. Maybe you want to attach extra metadata to the digest. Or maybe you want to impl some traits for Sha256Digest. As is, you couldn't for example impl std::fmt::Display for it.

It's mildly more annoying since you'd have to do something like Sha256Digest::from([0u8; 32]) and digest.bytes() or digest.0 or similar all over the place, but it gives you a lot more flexibility in the future without needing to break semver.

Edit: As I read the whole PR some more, I see you did basically exactly this in fsverity::hashvalue::{Sha256HashValue,Sha512HashValue}.

match digest.strip_prefix("sha256:") {
Some(rest) => Ok(parse_sha256(rest)?),
None => bail!("Manifest has non-sha256 digest"),
}
}

type ContentAndVerity = (Sha256HashValue, Sha256HashValue);
type ContentAndVerity<ObjectID> = (Sha256Digest, ObjectID);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't as big of a deal being a type alias since it's not public, but worth consideration still.

@allisonkarlitskaya allisonkarlitskaya merged commit a50ea8e into containers:main Apr 21, 2025
11 checks passed
@allisonkarlitskaya allisonkarlitskaya deleted the sha512 branch April 22, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants