-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
This is mostly a issue to start a discussion based on common use-case we see in high-energy physics / large scale scientific software that needs to be distributed globally, as I haven't yet understood the snapshotter code in detail. I'll describe the use-case first:
We have many images that build on top of base images that are curated by experts that provide a base software stack (data analysis frameworks) on top of which regular users built images by adding their anaysis code etc -- the images can become quite large (~10G) and most of the size is in the lower, expert-curated layers. The problem is now how workloads that require these images can pull and cache them efficiently without triggering large incoming
At CERN we have developed a global, tiered, read-only filesystem (CVMFS) that would be a good avenue to cache and distribute these lower layers at a per-file granularity
Therefore it would be nice, if one could write a snapshotter, that would inspect an image manifest, and for those layers that do exist on the global cache mount these layers from it, and for the remaining layers that are not available there, fall back onto the default snapshotter.
Perhaps there is a general solution in which a kind of federated snapshotter could prepare layers from multiple sources (e.g. flat filesystems, tarballs, etc). Would this be possible w/ the current snapshotter design?
I hope this made sense.. I'm adding @jblomer and @rochaporto who I'm sure can chime in a more informed way :)