chunked: generate tar-split as part of zstd:chunked #1627

giuseppe · 2023-06-01T12:58:58Z

change the file format to store the tar-split as part of the zstd:chunked image. This will allow clients to rebuild the entire tarball without having to download it fully.

also store the uncompressed digest for the tarball, so that it can be stored into the storage database.

Needs: containers/image#1976

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

pkg/chunked/compression_linux.go

pkg/chunked/compressor/compressor.go

pkg/chunked/internal/compression.go

pkg/chunked/storage_linux.go

pkg/chunked/internal/compression.go

pkg/chunked/compressor/compressor.go

drivers/overlay/overlay.go

layers.go

rhatdan · 2023-06-06T21:09:54Z

LGTM
@mtrmac PTAL

giuseppe · 2023-06-07T11:22:26Z

these fixes are independent of the work in c/image, except for chunked: add function to retrieve TOC digest

rhatdan · 2023-06-08T10:44:52Z

LGTM
@mtrmac It is up to you

giuseppe · 2023-06-12T20:17:14Z

rebased

giuseppe · 2023-06-13T08:24:57Z

I'd like to use these fixes for the new composefs integration work. Could we move forward and if there is anything to change/fix we could address it later?

layers.go

pkg/chunked/cache_linux.go

pkg/chunked/compression_linux.go

giuseppe · 2023-06-16T07:31:21Z

all comments adressed, and the CI is green

pkg/chunked/compression_linux.go

nalind · 2023-06-16T14:22:42Z

pkg/chunked/compressor/compressor.go

@@ -383,7 +426,20 @@ func writeZstdChunkedStream(destFile io.Writer, outMetadata map[string]string, r
 	}
 	zstdWriter = nil

-	return internal.WriteZstdChunkedManifest(dest, outMetadata, uint64(dest.Count), metadata, level)
+	if err := tarSplitData.zstd.Flush(); err != nil {


This branch doesn't close tarSplitData.zstd, and a number of branches above it don't close zstdWriter.

there is a func for zstdWriter to clean it up if it is not closed (and thus set to nil afterwards):

defer func() { if zstdWriter != nil { zstdWriter.Close() } }()

is that sufficient?

I've added the same pattern for tarSplitData

nalind · 2023-06-16T14:27:17Z

Mostly LGTM, just nits. restartCompression() in compressor.go is calling zstdWriter.Flush() after zstdWriter.Close(), which looks redundant.

layers.go

TomSweeneyRedHat · 2023-06-16T20:14:09Z

LGTM other than @nalind's comments

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

change the file format to store the tar-split as part of the zstd:chunked image. This will allow clients to rebuild the entire tarball without having to download it fully. also store the uncompressed digest for the tarball, so that it can be stored into the storage database. Needs: containers/image#1976 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2023-06-16T22:33:45Z

Mostly LGTM, just nits. restartCompression() in compressor.go is calling zstdWriter.Flush() after zstdWriter.Close(), which looks redundant.

thanks, I've added a separate patch to drop it since it was already this way (so in case of regressions, it is easier to bisect it)

rhatdan · 2023-06-17T07:49:45Z

LGTM

mtrmac

Catching up for containers/image#1980 … quite possibly the questions have been resolved since. Follow containers/image#1980 for progress.

mtrmac · 2023-08-29T14:55:19Z

layers.go

+		}
+		if err := ioutils.AtomicWriteFile(r.tspath(layer.ID), tsdata.Bytes(), 0o600); err != nil {
+			return err
+		}
 	}
 	for k, v := range diffOutput.BigData {
 		if err := r.SetBigData(id, k, bytes.NewReader(v)); err != nil {


SetBigData already calls saveFor(layer), so moving the saveFor in this function past this loop can not reliably achieve anything unusual, and to me only seems confusing.

#1844 proposes reverting the move.

drivers/overlay/overlay.go

pkg/chunked/internal/compression.go

mtrmac · 2023-08-29T16:47:12Z

pkg/chunked/compression_linux.go

 		}

 		offset = binary.LittleEndian.Uint64(footer[0:8])
 		length = binary.LittleEndian.Uint64(footer[8:16])
 		lengthUncompressed = binary.LittleEndian.Uint64(footer[16:24])
 		manifestType = binary.LittleEndian.Uint64(footer[24:32])
-		if !isZstdChunkedFrameMagic(footer[32:40]) {
-			return nil, 0, errors.New("invalid magic number")
+		if !isZstdChunkedFrameMagic(footer[48:56]) {


AFAICS WriteZstdChunkedManifest silte writes the magic at 32:40, and the other semantics (offset/length/…) match the offsets in that method.

So how does this work?!

thanks for spotting this.

I've opened a PR to fix it and add some tests: #1697

I am not 100% sure what to do with the footer data, this code path is not used by us at the moment, since we store this information in the OCI annotations for the layer (and if the OCI annotations are missing we don't even attempt a partial pull).
On the other hand it seems nice to have this information somewhere in the layer file itself, considering it is not a lot of data anyway.

I think plausible alternatives are:

Keep it as a frozen artifact, supporting only the old features. If the old features are sufficient to pull/push useful images.

Design some kind of forward-compatibility mechanism. Maybe a new magic value, and now including a version identifier. And the design would need to allow adding more fields, ie.e. the magic/version ID would need to be at a predictable place regardless of version. And it wouldn’t be a replacement for the annotations because we need the layer’s metadata to be authenticated without having to read all of the layer. If we are not consuming the data at all, all of this seems to me to be very likely to keep breaking, and just not worth the effort.

Redesign so that this data does not need extending: only include data about the manifest in here, and add all new fields in the manifest instead. That seems to me to be the cleanest way to have the layer blob be ~self-describing… OTOH in principle it would be possible for an attacker to point at one manifest in this footer and another in the annotation, so I’m not sure there’s that much of a benefit to being self-describing, either.

Completely drop support, and rely on annotations only? I didn’t look into the history, whether we can do that.

mtrmac · 2023-08-29T18:27:56Z

pkg/chunked/compression_linux.go


-	d, err := digest.Parse(manifestChecksumAnnotation)
+func decodeAndValidateBlob(blob []byte, lengthUncompressed uint64, expectedUncompressedChecksum string) ([]byte, error) {


The digest is actually a compressed digest.

Fixed in #1844.

mtrmac · 2023-08-29T18:40:11Z

pkg/chunked/storage_linux.go

+	if tocDigest, ok := annotations[estargz.TOCJSONDigestAnnotation]; ok {
+		d, err := digest.Parse(tocDigest)
+		if err != nil {
+			return nil, err
+		}
+		return &d, nil
+	}
+	if tocDigest, ok := annotations[internal.ManifestChecksumKey]; ok {


GetDiffer processes the digests in the opposite order. Doesn’t that end up using an arbitrary layer ID which differs from the manifest we actually process?

This was fixed here; #1844 also makes GetTOCDigest unambiguous.

pkg/chunked/storage_linux.go

drivers/overlay/overlay.go

giuseppe force-pushed the chunked-store-tarsplit branch 6 times, most recently from da1b8ad to 561ef29 Compare June 1, 2023 14:40

mtrmac reviewed Jun 1, 2023

View reviewed changes

giuseppe force-pushed the chunked-store-tarsplit branch 4 times, most recently from 5408a8f to c4b7edb Compare June 5, 2023 10:34

giuseppe mentioned this pull request Jun 5, 2023

copy: do not fail if digest mismatches containers/image#1980

Merged

giuseppe force-pushed the chunked-store-tarsplit branch 2 times, most recently from 1704458 to 62f24ea Compare June 5, 2023 13:05

mtrmac reviewed Jun 5, 2023

View reviewed changes

drivers/overlay/overlay.go Outdated Show resolved Hide resolved

mtrmac reviewed Jun 5, 2023

View reviewed changes

layers.go Outdated Show resolved Hide resolved

giuseppe force-pushed the chunked-store-tarsplit branch 2 times, most recently from 02d9ae3 to c65dd10 Compare June 6, 2023 19:00

giuseppe force-pushed the chunked-store-tarsplit branch from c65dd10 to 91a1fe9 Compare June 7, 2023 09:21

giuseppe force-pushed the chunked-store-tarsplit branch from 0fd208b to 3919680 Compare June 12, 2023 20:16

nalind reviewed Jun 14, 2023

View reviewed changes

layers.go Outdated Show resolved Hide resolved

nalind reviewed Jun 14, 2023

View reviewed changes

layers.go Show resolved Hide resolved

nalind reviewed Jun 14, 2023

View reviewed changes

pkg/chunked/cache_linux.go Show resolved Hide resolved

nalind reviewed Jun 14, 2023

View reviewed changes

pkg/chunked/compression_linux.go Show resolved Hide resolved

nalind reviewed Jun 14, 2023

View reviewed changes

pkg/chunked/compression_linux.go Outdated Show resolved Hide resolved

giuseppe force-pushed the chunked-store-tarsplit branch from 45e6d1f to 1c356e2 Compare June 15, 2023 16:00

nalind reviewed Jun 16, 2023

View reviewed changes

pkg/chunked/compression_linux.go Show resolved Hide resolved

nalind reviewed Jun 16, 2023

View reviewed changes

TomSweeneyRedHat reviewed Jun 16, 2023

View reviewed changes

layers.go Show resolved Hide resolved

giuseppe added 13 commits June 17, 2023 00:31

chunked: drop superfluous flush

c3d13cc

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

vfs: ignore duplicate "ro" options

e20afae

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

btrfs: ignore duplicate "ro" options

dc7c802

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

fsdiff: mount the image read-only

afafcdc

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: fix reading modtime from the TOC

d53cea9

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: drop superfluous variable

7846152

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

driver: keep TOC digest

b007d17

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

overlay: calculate a digest on the TOC

7f3c84d

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: convert tag name to lowercase

f3a7e9c

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: report used UIDs/GIDs

5d10b94

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: add function to retrieve TOC digest

7304a21

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

chunked: drop support for uncompressed metadata blobs

032aae3

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the chunked-store-tarsplit branch from 1c356e2 to 032aae3 Compare June 16, 2023 22:34

rhatdan merged commit c989f8f into containers:main Jun 17, 2023

mtrmac reviewed Aug 29, 2023

View reviewed changes

mtrmac mentioned this pull request Nov 13, 2023

Zstd(:chunked) work tracking checklist containers/image#2189

Open

37 tasks

mtrmac mentioned this pull request Feb 26, 2024

Chunked cleanups #1844

Merged


		d, err := digest.Parse(manifestChecksumAnnotation)
		func decodeAndValidateBlob(blob []byte, lengthUncompressed uint64, expectedUncompressedChecksum string) ([]byte, error) {

chunked: generate tar-split as part of zstd:chunked #1627

chunked: generate tar-split as part of zstd:chunked #1627

Uh oh!

Conversation

giuseppe commented Jun 1, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhatdan commented Jun 6, 2023

Uh oh!

giuseppe commented Jun 7, 2023

Uh oh!

rhatdan commented Jun 8, 2023

Uh oh!

giuseppe commented Jun 12, 2023

Uh oh!

giuseppe commented Jun 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

giuseppe commented Jun 16, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nalind commented Jun 16, 2023

Uh oh!

Uh oh!

TomSweeneyRedHat commented Jun 16, 2023

Uh oh!

giuseppe commented Jun 16, 2023

Uh oh!

rhatdan commented Jun 17, 2023

Uh oh!

mtrmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!