Skip to content

layers: support BigData #787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 2, 2021
Merged

Conversation

giuseppe
Copy link
Member

similarly to containers and images, add support for storing big data
also for layers, so that it is possible to store arbitrary data when
it is not feasible to embed it in the layers JSON metadata file.

Subset of: #775

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

@giuseppe
Copy link
Member Author

@nalind @rhatdan PTAL

@rhatdan
Copy link
Member

rhatdan commented Dec 15, 2020

LGTM
@vrothberg @mtrmac @TomSweeneyRedHat PTAL

@giuseppe giuseppe marked this pull request as draft December 16, 2020 09:02
@giuseppe giuseppe marked this pull request as ready for review December 16, 2020 09:39
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A literally one-minute impression, and design concerns like this are definitely decided by @nalind , not me:

This can make sense as an internal implementation mechanism; it’s not obvious to me that this must be exposed as a raw data to external callers. If c/image is only going to consume that data through a zstd:chunked - specific API, only that zstd:chunked - specific API needs to be public.

Making raw data public like this

  • effectively freezes the format of the raw data, slowing down future improvements.
  • invites other uses of raw data for unexpected purposes. That can be very useful and allow for unexpected innovations, OTOH it also further cements down the concept of individual layers and makes it even harder to significantly restructure (e.g. to a file-based approach like OSTree or the chunked format).

Again, definitely up to @nalind, just airing my first impression.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more notes. I didn’t read all of the code carefully, and I completely defer to actual c/storage maintainers for all the design decisions.

layers.go Outdated

// BigDataSizes maps the names in BigDataNames to the sizes of the data
// that has been stored, if they're known.
BigDataSizes map[string]int64 `json:"big-data-sizes,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading the API documentation, I would expect BigDataSizes to be included in ImageSize. (Otherwise it’s not even clear to me there needs to be a separate faster-to-access record of the sizes.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only added it because it was present in the image API. I am fine with dropping it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn’t the size be included in ImageSize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop it.

dropped and pushed a new version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn’t the size be included in ImageSize?

it is not used at the moment, we can add it if there is any user of it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Podman repo suggests that store.ImageSize is called for “disk usage statistics” (df?), “inspect”, “prune” and “tree”; I didn’t trace the full call stack, it may be exposed via other features or it might be dead code.

OTOH it’s true that c/image’s types.Image.Size() implementation does its own calculation which does not currently benefit from extending store.ImageSize.

@giuseppe giuseppe force-pushed the big-data-layers branch 3 times, most recently from 44b1af2 to 3e4cb85 Compare January 9, 2021 14:54
@giuseppe
Copy link
Member Author

is there anything more left to address?

@rhatdan
Copy link
Member

rhatdan commented Jan 19, 2021

@nalind @mtrmac PTAL

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 19, 2021

See the 2 points from my previous review.

@giuseppe
Copy link
Member Author

See the 2 points from my previous review.

just replied to them

@giuseppe
Copy link
Member Author

@mtrmac @rhatdan anything more holding this PR?

@rhatdan
Copy link
Member

rhatdan commented Jan 22, 2021

Not from me. Just waiting for @mtrmac comment.

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 22, 2021

*shrug* I don’t have anything to add WRT the implementation; I have some concerns but they are not quite blocking.

As for the more general design review, I know very little about c/storage , that would be up to @nalind I think.

@giuseppe
Copy link
Member Author

how to move forward? This is just a subset of the changes I'd like to get merged for zstd:chunked support

@rhatdan
Copy link
Member

rhatdan commented Jan 29, 2021

@vrothberg @nalind PTAL so we can get this merged.

@vrothberg
Copy link
Member

Added to my todo list. I won't make it today but promise to have a look on Monday.

Apologies for not having looked into it until now!

Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting approved

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

similarly to containers and images, add support for storing big data
also for layers, so that it is possible to store arbitrary data when
it is not feasible to embed it in the layers JSON metadata file.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented Feb 2, 2021

LGTM

@rhatdan rhatdan merged commit 94f18b8 into containers:master Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants