Skip to content

[1.6/1.7] kubernetes ephemeral-storage limits not enforced with remote snapshotters #10095

@Kern--

Description

@Kern--

Description

When using a remote snapshotter (or any other snapshotter that doesn't place snapshots under the containerd root directory), ephemeral storage limits are not enforced by the kubelet. The container can blow past its limits and keep running indefinitely.

The kublet logs show errors like:

kubelet[3094]: E0419 15:57:23.046299    3094 cri_stats_provider.go:448] "Failed toget the info of the filesystem with mountpoint" err="failed to get device for dir \"/var/lib/containerd/io.containerd.snapshotter.v1.soci\": stat failed on /var/lib/containerd/io.containerd.snapshotter.v1.soci with error: no such file or directory" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.soci"

and

kubelet[3094]: E0419 15:56:55.022396    3094 kubelet.go:1436]  "Image garbage collection failed multiple times in a row" err="invalid capacity 0 on image filesystem"

It looks like the kublet is unable to run ephemeral storage checks and image garbage collection because it's looking for image filesystem information in the wrong place.

Steps to reproduce the issue

  1. Configure containerd to use a remote snapshotter in a k8s environment
  2. Create a pod with an ephemeral storage limit:
resources:
  limits:
    ephemeral-storage: 20M
  requests:
    ephemeral-storage: 10M
  1. Exec into the container and allocate more disk space than allowed
# fallocate -l 1G test1
  1. Observe that the pod does not get evicted and the kubelet logs show errors above

Describe the results you received and expected

The pod should be evicted and the kubelet logs should not show erorrs

What version of containerd are you using?

containerd github.com/containerd/containerd 1.7.11 64b8a81

Any other relevant information

Related downstream issue awslabs/soci-snapshotter#1093

Show configuration if it is related to CRI plugin.

$ cat /etc/containerd/config.toml

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
address = "/run/containerd/containerd.sock"

[proxy_plugins.soci]
type = "snapshot"
address = "/run/soci-snapshotter-grpc/soci-snapshotter-grpc.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true
snapshotter = "soci"
# This line is required for containerd to send information about how to lazily load the image to the snapshotter
disable_snapshot_annotations = false

[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions