-
Notifications
You must be signed in to change notification settings - Fork 1.2k
runtime: add option to force guest pull #11244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Let me be more verbose here. I've changed the coco-non-tee test to run with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm but I'm not up-to-speed on guest pull mechanics.
Thx @katexochen @burgerdev for this new feature. One more question, given it works well, shall we need the CSI of Ephemeral storage to serve as external storage for CoCo to store the image pulled inside guest ? |
993c5be
to
3f0207f
Compare
Yes, the container image is still be pulled on the host, as it was with nydus. See the discussion in #11041.
I don't understand this question.
This is an orthogonal problem. On the guest side, this PR does exactly the same as guest pull with nydus-snapshotter. |
If this is the question what I had in mind, it's about what snapshotter configuration to use in containerd after this change. |
No snapshotter-specific configuration is needed in the containerd config anymore. |
Ok, currently
Sorry, let me make it clear, I just want to make it clear that how to prevent pulling layers onto host without remote snapshotter.
Ok, it should be another question about if it's time to drop CSI of Ephemeral storage to serve guest-pull case. If it's a good time and place, I will talk about it. |
You'd want to keep it as long as you do guest pull to store the image layers to some ephemeral protected disk storage so that your CVM RAM (tmpfs) is not used for it. |
Yes, it act as the ephemeral protected disk storage, But one point, I tend to say that it have the same lifetime with kata/CoCo Pod. |
This is a conjecture for which I'd like to see evidence. The hypothesis underlying this PR is that containerd does pull layers on the host, and there is evidence for that. Refer to the reproducer in #11162 (comment). The assumption that containerd does not pull layers on the host does not seem to hold because
No - the force-guest-pull setup does not need anything related to Nydus. The host uses the default snapshotter, the guest is using |
3f0207f
to
afc1e50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, I would like the annotation to be documented.
afc1e50
to
d1963a7
Compare
Yeah, In my mind, it seems a hard problem for me, I have no idea how to address it.
Could you please confirm if the configuration is indeed correct?
and with its related nydus snapshotter logs
guest-pulling mode are container images pulled inside guest, not pulled on host, without remote snapshotter which help prevent images pulling on host, how it will work I'm still confused. Is there anything I miss ? |
One thing I have heard is that the nydus approach will pull the entire image on the host unless the annotation |
Thanks for the feedback @Apokleos, I think we're starting to get to the root of the issue here. I just ran the deployment script again (linked above), and I can reproduce my observations. Layer content
To be honest, I don't know - this is what the `ctr version`
`/etc/containerd/config.toml`version = 2
oom_score = -999
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "mcr.microsoft.com/oss/kubernetes/pause:3.6"
[plugins."io.containerd.grpc.v1.cri".containerd]
discard_unpacked_layers = false
disable_snapshot_annotations = false
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = "/usr/bin/runc"
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.untrusted]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.untrusted.options]
BinaryName = "/usr/bin/runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-coco-dev]
runtime_type = "io.containerd.kata-qemu-coco-dev.v2"
runtime_path = "/opt/kata/bin/containerd-shim-kata-v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
snapshotter = "nydus"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-coco-dev.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
[plugins."io.containerd.grpc.v1.cri".registry.headers]
X-Meta-Source-Client = ["azure/aks"]
[metrics]
address = "0.0.0.0:10257"
[proxy_plugins.nydus]
type = "snapshot"
address = "/run/containerd-nydus/containerd-nydus-grpc.sock"
[debug]
level = "debug" `/etc/nydus/config.toml`version = 1
# Snapshotter's own home directory where it stores and creates necessary resources
root = "/var/lib/containerd/io.containerd.snapshotter.v1.nydus"
# The snapshotter's GRPC server socket, containerd will connect to plugin on this socket
address = "/run/containerd-nydus/containerd-nydus-grpc.sock"
# The nydus daemon mode can be one of the following options: multiple, dedicated, shared, or none.
# If `daemon_mode` option is not specified, the default value is multiple.
daemon_mode = "none"
# Whether snapshotter should try to clean up resources when it is closed
cleanup_on_close = false
[system]
# Snapshotter's debug and trace HTTP server interface
enable = true
# Unix domain socket path where system controller is listening on
address = "/run/containerd-nydus/system.sock"
[system.debug]
# Snapshotter can profile the CPU utilization of each nydusd daemon when it is being started.
# This option specifies the profile duration when nydusd is downloading and uncomproessing data.
daemon_cpu_profile_duration_secs = 5
# Enable by assigning an address, empty indicates pprof server is disabled
pprof_address = ""
[daemon]
# Specify a configuration file for nydusd
nydusd_config = "/etc/nydus/nydusd-config.fusedev.json"
nydusd_path = "/usr/local/bin/nydusd"
nydusimage_path = "/usr/local/bin/nydus-image"
# The fs driver can be one of the following options: fusedev, fscache, blockdev, proxy, or nodev.
# If `fs_driver` option is not specified, the default value is fusedev.
fs_driver = "proxy"
# How to process when daemon dies: "none", "restart" or "failover"
recover_policy = "restart"
# Nydusd worker thread number to handle FUSE or fscache requests, [0-1024].
# Setting to 0 will use the default configuration of nydusd.
threads_number = 4
# Log rotation size for nydusd, in unit MB(megabytes). (default 100MB)
log_rotation_size = 100
[cgroup]
# Whether to use separate cgroup for nydusd.
enable = true
# The memory limit for nydusd cgroup, which contains all nydusd processes.
# Percentage is supported as well, please ensure it is end with "%".
# The default unit is bytes. Acceptable values include "209715200", "200MiB", "200Mi" and "10%".
memory_limit = ""
[log]
# Print logs to stdout rather than logging files
log_to_stdout = false
# Snapshotter's log level
level = "info"
log_rotation_compress = true
log_rotation_local_time = true
# Max number of days to retain logs
log_rotation_max_age = 7
log_rotation_max_backups = 5
# In unit MB(megabytes)
log_rotation_max_size = 100
[metrics]
# Enable by assigning an address, empty indicates metrics server is disabled
address = ":9110"
[remote]
convert_vpc_registry = false
[remote.mirrors_config]
# Snapshotter will overwrite daemon's mirrors configuration
# if the values loaded from this driectory are not null before starting a daemon.
# Set to "" or an empty directory to disable it.
#dir = "/etc/nydus/certs.d"
[remote.auth]
# Fetch the private registry auth by listening to K8s API server
enable_kubeconfig_keychain = false
# synchronize `kubernetes.io/dockerconfigjson` secret from kubernetes API server with specified kubeconfig (default `$KUBECONFIG` or `~/.kube/config`)
kubeconfig_path = ""
# Fetch the private registry auth as CRI image service proxy
enable_cri_keychain = false
# the target image service when using image proxy
#image_service_address = "/run/containerd/containerd.sock"
[snapshot]
# Let containerd use nydus-overlayfs mount helper
enable_nydus_overlayfs = false
# Insert Kata Virtual Volume option to `Mount.Options`
enable_kata_volume = true
# Whether to remove resources when a snapshot is removed
sync_remove = false
[cache_manager]
# Disable or enable recyclebin
disable = false
# How long to keep deleted files in recyclebin
gc_period = "24h"
# Directory to host cached files
cache_dir = ""
[image]
public_key_file = ""
validate_signature = false
# The configuraions for features that are not production ready
[experimental]
# Whether to enable stargz support
enable_stargz = false
# Whether to enable referrers support
# The option enables trying to fetch the Nydus image associated with the OCI image and run it.
# Also see https://github.com/opencontainers/distribution-spec/blob/main/spec.md#listing-referrers
enable_referrer_detect = false
# Whether to enable authentication support
# The option enables nydus snapshot to provide backend information to nydusd.
enable_backend_source = false
[experimental.tarfs]
# Whether to enable nydus tarfs mode. Tarfs is supported by:
# - The EROFS filesystem driver since Linux 6.4
# - Nydus Image Service release v2.3
enable_tarfs = false
# Mount rafs on host by loopdev and EROFS
mount_tarfs_on_host = false
# Only enable nydus tarfs mode for images with `tarfs hint` label when true
tarfs_hint = false
# Maximum of concurrence to converting OCIv1 images to tarfs, 0 means default
max_concurrent_proc = 0
# Mode to export tarfs images:
# - "none" or "": do not export tarfs
# - "layer_verity_only": only generate disk verity information for a layer blob
# - "image_verity_only": only generate disk verity information for all blobs of an image
# - "layer_block": generate a raw block disk image with tarfs for a layer
# - "image_block": generate a raw block disk image with tarfs for an image
# - "layer_block_with_verity": generate a raw block disk image with tarfs for a layer with dm-verity info
# - "image_block_with_verity": generate a raw block disk image with tarfs for an image with dm-verity info
export_mode = ""
Sorry, the reasoning could have been more clear. Let me try to explain:
This argument hinges on (2): if there is a configuration where Nydus does not pull layer content on the host, we do indeed have a trade-off. I, personally, am ok with pulling on the host and in the guest, because (a) host pull only happens once per node and is cached afterwards and (b) I prefer wasting network bandwidth to dealing with the stability issues caused by Nydus. Others' views may differ.
That is interesting, I wonder why. At least it does not seem to be a direct effect, because then I would have expected a reference to that annotation in the Nydus snapshotter repo, which I did not find. |
Personally, I think it is reasonable to support something like this in the short term given all the issues that actual users have with Nydus. Hopefully we can also really understand the optimal solution as well. @csegarragonz might remember something more about the behavior of the |
Hi @burgerdev I have reproduced the issue you have with yours or mine configuration for nydus snapshotter.
And it's also as what Tobin @fitzthum said. |
Yes, what @fitzthum and @Apokleos mention confirm what I observed. In summary, even if nydus is set as a snapshotter for a given runtime class, containerd may first If you enable the experimental annotation, containerd will honour it here. As a rule of thumb, you should always see |
The reason I commented that "resolving the named user..." is hard because the relevant processing flow is deeply intertwined with containerd. This is why I previously speculated about the potential need to modify containerd's logic. In the previous issue #11162, the discussed issue by @Camelron @imeoer and @burgerdev about However, with Maybe, some methods needed to help work around for this issue based on its function, for example, explicitly specify the username and UID etc. in Dockerfile, but I am not sure if there's any side effects.
|
Okay, so to summarize the discussion: guest-pull with nydus-snapshotter:
force_guest_pull:
As I understand, there isn't really any argument against adding force_guest_pull as option a user can enable. Switching guest-pull by default to this mechanism is another discussion that doesn't need to happen on this PR. |
...
nice summary of the discussion!
|
359f121
to
059e889
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I'd wait to push this after the release is done.
Thanks @katexochen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
After our discussion in the CoCo meeting, I think it makes sense to support this at least until containerd support multiple snapshotters (and we default to containerd 2.0). Until that point, it seems like whatever snapshotter we use will potentially run into synchronization issues. |
I think we established that the force-guest-pull approach can be useful for users struggling with snapshotters, and it's an experimental API that we can remove after we found the right™ way to do guest pulling. Now that 3.17 is released, are there any objections to merging this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There isn't a perfect existing solution, but I favor trying different approaches while simultaneously exploring new ones. Let's move it forward.
Please, do! Once it's green, merge it in. |
059e889
to
be0c593
Compare
@katexochen, from the logs I can see:
|
This enables guest pull via config, without the need of any external snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of relying on annotations to select the way to share the root FS, we always use guest pull. Co-authored-by: Markus Rudy <mr@edgeless.systems> Signed-off-by: Paul Meyer <katexochen0@gmail.com>
be0c593
to
c4815eb
Compare
Was missing |
This enables guest pull via config, without the need of any external snapshotter. When the config enables runtim.force_guest_pull, instead of relying on annotations to select the way to share the root FS, we always use guest pull.