-
Notifications
You must be signed in to change notification settings - Fork 262
chunked: generate tar-split as part of zstd:chunked #1627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
da1b8ad
to
561ef29
Compare
5408a8f
to
c4b7edb
Compare
1704458
to
62f24ea
Compare
02d9ae3
to
c65dd10
Compare
LGTM |
c65dd10
to
91a1fe9
Compare
these fixes are independent of the work in c/image, except for chunked: add function to retrieve TOC digest |
LGTM |
0fd208b
to
3919680
Compare
rebased |
I'd like to use these fixes for the new composefs integration work. Could we move forward and if there is anything to change/fix we could address it later? |
45e6d1f
to
1c356e2
Compare
all comments adressed, and the CI is green |
@@ -383,7 +426,20 @@ func writeZstdChunkedStream(destFile io.Writer, outMetadata map[string]string, r | |||
} | |||
zstdWriter = nil | |||
|
|||
return internal.WriteZstdChunkedManifest(dest, outMetadata, uint64(dest.Count), metadata, level) | |||
if err := tarSplitData.zstd.Flush(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch doesn't close tarSplitData.zstd
, and a number of branches above it don't close zstdWriter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a func for zstdWriter to clean it up if it is not closed (and thus set to nil afterwards):
defer func() {
if zstdWriter != nil {
zstdWriter.Close()
}
}()
is that sufficient?
I've added the same pattern for tarSplitData
Mostly LGTM, just nits. |
LGTM other than @nalind's comments |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
change the file format to store the tar-split as part of the zstd:chunked image. This will allow clients to rebuild the entire tarball without having to download it fully. also store the uncompressed digest for the tarball, so that it can be stored into the storage database. Needs: containers/image#1976 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
thanks, I've added a separate patch to drop it since it was already this way (so in case of regressions, it is easier to bisect it) |
1c356e2
to
032aae3
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching up for containers/image#1980 … quite possibly the questions have been resolved since. Follow containers/image#1980 for progress.
} | ||
if err := ioutils.AtomicWriteFile(r.tspath(layer.ID), tsdata.Bytes(), 0o600); err != nil { | ||
return err | ||
} | ||
} | ||
for k, v := range diffOutput.BigData { | ||
if err := r.SetBigData(id, k, bytes.NewReader(v)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetBigData
already calls saveFor(layer)
, so moving the saveFor
in this function past this loop can not reliably achieve anything unusual, and to me only seems confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1844 proposes reverting the move.
} | ||
|
||
offset = binary.LittleEndian.Uint64(footer[0:8]) | ||
length = binary.LittleEndian.Uint64(footer[8:16]) | ||
lengthUncompressed = binary.LittleEndian.Uint64(footer[16:24]) | ||
manifestType = binary.LittleEndian.Uint64(footer[24:32]) | ||
if !isZstdChunkedFrameMagic(footer[32:40]) { | ||
return nil, 0, errors.New("invalid magic number") | ||
if !isZstdChunkedFrameMagic(footer[48:56]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS WriteZstdChunkedManifest
silte writes the magic at 32:40
, and the other semantics (offset/length/…) match the offsets in that method.
So how does this work?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for spotting this.
I've opened a PR to fix it and add some tests: #1697
I am not 100% sure what to do with the footer data, this code path is not used by us at the moment, since we store this information in the OCI annotations for the layer (and if the OCI annotations are missing we don't even attempt a partial pull).
On the other hand it seems nice to have this information somewhere in the layer file itself, considering it is not a lot of data anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think plausible alternatives are:
- Keep it as a frozen artifact, supporting only the old features. If the old features are sufficient to pull/push useful images.
- Design some kind of forward-compatibility mechanism. Maybe a new magic value, and now including a version identifier. And the design would need to allow adding more fields, ie.e. the magic/version ID would need to be at a predictable place regardless of version. And it wouldn’t be a replacement for the annotations because we need the layer’s metadata to be authenticated without having to read all of the layer. If we are not consuming the data at all, all of this seems to me to be very likely to keep breaking, and just not worth the effort.
- Redesign so that this data does not need extending: only include data about the manifest in here, and add all new fields in the manifest instead. That seems to me to be the cleanest way to have the layer blob be ~self-describing… OTOH in principle it would be possible for an attacker to point at one manifest in this footer and another in the annotation, so I’m not sure there’s that much of a benefit to being self-describing, either.
- Completely drop support, and rely on annotations only? I didn’t look into the history, whether we can do that.
|
||
d, err := digest.Parse(manifestChecksumAnnotation) | ||
func decodeAndValidateBlob(blob []byte, lengthUncompressed uint64, expectedUncompressedChecksum string) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The digest is actually a compressed digest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #1844.
if tocDigest, ok := annotations[estargz.TOCJSONDigestAnnotation]; ok { | ||
d, err := digest.Parse(tocDigest) | ||
if err != nil { | ||
return nil, err | ||
} | ||
return &d, nil | ||
} | ||
if tocDigest, ok := annotations[internal.ManifestChecksumKey]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetDiffer
processes the digests in the opposite order. Doesn’t that end up using an arbitrary layer ID which differs from the manifest we actually process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fixed here; #1844 also makes GetTOCDigest
unambiguous.
change the file format to store the tar-split as part of the zstd:chunked image. This will allow clients to rebuild the entire tarball without having to download it fully.
also store the uncompressed digest for the tarball, so that it can be stored into the storage database.
Needs: containers/image#1976
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com