-
Notifications
You must be signed in to change notification settings - Fork 1.2k
runtime-rs: Add full cgroups support on host #11598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Testing ResultsStarting a pod with 2 CPUs and 4 GiB of memory across all tests. Systemd + cgroup v2 + overhead cgroupPod and cgroups information
Show the systemd unit (part of the output) $ UNIT="cri-containerd-8564224395a4fbf7f2792baa22ee16dfc8f8e862e20342d3b2bcc211a3939af6.scope"
$ systemctl show $UNIT
TimeoutStopUSec=5min
Slice=kubepods-poddc430cd3_34b7_4ea6_be26_ddfd88d7ddff.slice
ControlGroup=/kubepods.slice/kubepods-poddc430cd3_34b7_4ea6_be26_ddfd88d7ddff.slice/cri-containerd-8564224395a4fbf7f2792baa22ee16dfc8f8e862e20342d3b2bcc211a3939af6.scope
Delegate=yes
CPUAccounting=yes
IOAccounting=yes
MemoryAccounting=yes
TasksAccounting=yes
Requires=kubepods-poddc430cd3_34b7_4ea6_be26_ddfd88d7ddff.slice
ActiveState=active Show the cgroup contents (QEMU process is under the sandbox cgroup) $ systemd-cgls name
Control group /:
├─kata_overhead
│ └─8564224395a4fbf7f2792baa22ee16dfc8f8e862e20342d3b2bcc211a3939af6
│ ├─29743 /home/vagrant/kata-containers/src/runtime-rs/target/x86_64-unknown-linux-musl/debug/containerd-shim-kata-v2 -id 8564224395a4fbf7f2792ba>
│ └─29780 /home/vagrant/kata-static/kata/libexec/virtiofsd --socket-path virtiofsd.sock --shared-dir /run/kata-containers/shared/sandboxes/856422>
└─kubepods.slice
├─kubepods-poddc430cd3_34b7_4ea6_be26_ddfd88d7ddff.slice
│ └─cri-containerd-8564224395a4fbf7f2792baa22ee16dfc8f8e862e20342d3b2bcc211a3939af6.scope
│ └─29807 /usr/local/bin/qemu-system-x86_64 -name sandbox-8564224395a4fbf7f2792baa22ee16dfc8f8e862e20342d3b2bcc211a3939af6 -kernel /home/vagran> Show the memory limit of the parent (memory limit = 4Gi) $ cat /sys/fs/cgroup/kubepods.slice/kubepods-poddc430cd3_34b7_4ea6_be26_ddfd88d7ddff.slice
4294967296 Systemd + cgroup v2 + sandbox cgroup onlyPod and cgroups information
The overhead cgroup ( $ systemd-cgls name | grep kata_overhead | grep -v grep The sandbox cgroup exists $ systemd-cgls name
└─kubepods.slice
├─kubepods-pod3c311657_556e_48de_92ac_998d761f36b0.slice
│ └─cri-containerd-431eaa4675c09cb0fc49746f6d6157e80c5418311bc4bf6ab75af7064a05dabb.scope
│ ├─83236 /home/vagrant/kata-containers/src/runtime-rs/target/x86_64-unknown-linux-musl/debug/containerd-shim-kata-v2 -id 431eaa4675c09cb0fc497>
│ ├─83257 /home/vagrant/kata-static/kata/libexec/virtiofsd --socket-path virtiofsd.sock --shared-dir /run/kata-containers/shared/sandboxes/431e>
│ └─83260 /usr/local/bin/qemu-system-x86_64 -name sandbox-431eaa4675c09cb0fc49746f6d6157e80c5418311bc4bf6ab75af7064a05dabb -kernel /home/vagran> Cgroupfs + cgroup v1 + overhead cgroupPod and cgroups information
The overhead cgroup ( $ pod=bcac3c59c3ec4bdf3fb679be3f139e2f5e6ce94472239e2988db2bb0b5655035
$ cat /sys/fs/cgroup/memory/kata_overhead/$pod/cgroup.procs
147038
147053
147091
147094
$ sudo ps aux | grep 147038
root 147038 11.0 2.5 706164 422888 ? Sl 02:29 0:24 /home/vagrant/kata-containers/src/runtime-rs/target/x86_64-unknown-linux-musl/debug/containerd-shim-kata-v2 -id bcac3c59c3ec4bdf3fb679be3f139e2f5e6ce94472239e2988db2bb0b5655035 -namespace k8s.io -address /run/containerd/containerd.sock -publish-binary /usr/local/bin/containerd -debug
$ sudo ps aux | grep 147053
root 147053 0.0 0.0 6300744 6240 ? Sl 02:29 0:00 /home/vagrant/kata-static/kata/libexec/virtiofsd --socket-path virtiofsd.sock --shared-dir /run/kata-containers/shared/sandboxes/bcac3c59c3ec4bdf3fb679be3f139e2f5e6ce94472239e2988db2bb0b5655035/ro --cache auto --sandbox none --seccomp none --thread-pool-size=1 -o announce_submounts
$ sudo ps aux | grep 147091
root 147091 36.3 2.2 6931864 364580 ? Sl 02:29 5:15 /usr/local/bin/qemu-system-x86_64 -name sandbox-bcac3c59c3ec4bdf3fb679be3f139e2f5e6t
$ sudo ps aux | grep 147094
root 147094 0.0 0.0 0 0 ? S 02:29 0:00 [kvm-nx-lpage-recovery-147091] The sandbox cgroup exists, and only vCPUs (2 threads) are added into that cgroup. $ CGROUPS_PATH="kubepods/pode5352f9d-d513-4f73-80cf-03c11b34b870/bcac3c59c3ec4bdf3fb679be3f139e2f5e6ce94472239e2988db2bb0b5655035"
$ cat /sys/fs/cgroup/memory/$CGROUPS_PATH/cgroup.procs
147091
$ cat /sys/fs/cgroup/memory/$CGROUPS_PATH/tasks
147097
147098 Cgroupfs + cgroup v1 + sandbox cgroup onlyPod and cgroups information
The overhead cgroup not exists $ pod=94e311d6639f1346368216035e5c13a8d4a12fa5671ed9f400c147cc28668340
$ ls /sys/fs/cgroup/memory/kata_overhead/$pod
ls: cannot access '/sys/fs/cgroup/memory/kata_overhead/94e311d6639f1346368216035e5c13a8d4a12fa5671ed9f400c147cc28668340': No such file or directory The sandbox cgroup exists $ CGROUPS_PATH="kubepods/poda729a9aa-5c8e-483a-8986-e67176150bf7/94e311d6639f1346368216035e5c13a8d4a12fa5671ed9f400c147cc28668340"
$ cat /sys/fs/cgroup/memory/$CGROUPS_PATH/cgroup.procs
171635
171659
171666
171669
$ cat /sys/fs/cgroup/memory/$CGROUPS_PATH/../memory.limit_in_bytes
4294967296 |
Hey guys @jepio @fidencio @Champ-Goblem , needs some input here. |
e4af8d2
to
22c52b7
Compare
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This is a follow-up patch to kata-containers#11598, aimed at bump cgroups-rs to 0.4.1, so that the two Rust components share the same codebase to manage cgroups. Introduce two new types, `SandboxCgroupManager` and `ContainerCgroupManager`. The `SandboxCgroupManager` is used to manage sandbox resources. Device cgroups have been supported so far. The `ContainerCgroupManager` is used to manage container resources. It has a copy of the sandbox cgroup manager, so it can update sandbox if needed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
The first commit is to implement
get_thread_ids()
for QEMU to return thereal vCPU thread ids. It is a required feature, since our tests are using
QEMU.
The second commit is to ignore SIGTERM signal. When enabling systemd cgroup
driver and sandbox cgroup only, the shim is under a systemd unit. When the
unit is stopping, systemd sends SIGTERM to the shim. The shim can't exit
immediately, as there are some cleanups to do. Therefore, ignoring SIGTERM
is required here. The shim should complete the work within a period (Kata
sets it to 300s by default). Once a timeout occurs, systemd will send
SIGKILL.
The third one is to add full cgroups support on host.
Cgroups are managed by
FsManager
andSystemdManager
. As the namesimpies, the
FsManager
manages cgroups through cgroupfs, while theSystemdManager
manages cgroups through systemd. The two manages supportcgroup v1 and cgroup v2.
Two types of cgroups path are supported:
cgroups by
SystemdManager
;FsManager
.vCPU threads are added into the sandbox cgroups in cgroup v1 + cgroupfs,
others, cgroup v1 + systemd, cgroup v2 + cgroupfs, cgroup v2 + systemd, VMM
process is added into the cgroups.
The systemd doesn't provide a way to add thread to a unit.
add_thread()
in
SystemdManager
is equivalent toadd_process()
.Cgroup v2 supports threaded mode. However, we should enable threaded mode
from leaf node to the root node (
/
) iteratively [1]. This means theruntime needs to modify the cgroups created by container runtime (e.g.
containerd). Considering cgroupfs + cgroup v2 is not a common combination,
its behavior is aligned with systemd + cgroup v2, which is not allowed to
manage process at the thread level.
1: https://www.kernel.org/doc/html/v4.18/admin-guide/cgroup-v2.html#threads
Fixes: #11356
Signed-off-by: Xuewei Niu niuxuewei.nxw@antgroup.com