-
Notifications
You must be signed in to change notification settings - Fork 41.2k
Description
This issue was reported in the Kubernetes Security Audit Report
Description
PIDs are not process handles. A given PID may be reused in two dependent operations leading to a “Time Of Check vs Time Of Use” (TOCTOU) attack. This occurs in the Linux container manager ensureProcessInContainerWithOOMScore function, which (Figure 1):
Checks if a PID is running on host via reading /proc//ns/pid with the isProcessRunningInHost function,
Gets cgroups for pid via reading /proc//cgroup by getContainer function,
Moves the PID to the manager’s cgroup,
Sets an out-of-memory killer badness heuristic, which determines the likelihood of whether a process will be killed in out-of-memory scenarios, via writing to /proc//oom_score_adj in ApplyOOMScoreAdj.
These operations allow an attacker to move a process to the manager’s cgroup, giving it access to full devices on the host, and change the OOM-killer badness heuristic from either the node host or from a container on the machine, assuming the attacker also has access to unprivileged users on the node host.
Exploit Scenario
Eve gains access to an unprivileged user on a node host and a root user on a Pod container on the same host within Alice’s cluster. Eve prepares a malicious process and PID-reuse attack against the docker-containerd process. Eve spawns a process within the Pod container as the root user, taking advantage of the TOCTOU and elevates her cgroup to gain read and write access to all devices. AppArmor blocks Eve from mounting devices, however, her the process is still able to read from and write to host devices.
This issue is more easily exploitable by abusing the behavior discovered in TOB-K8S-021.
See Appendix D for a proof of concept for this attack without the PID reuse.
Recommendations
Short term, when performing operations on files in the /proc// directory for a given pid, open a directory stream file descriptor for /proc// and use this handle when reading or writing to files.
It does not currently appear possible to prevent TOCTOU race conditions between the checks and moving the process to a cgroup because this is done by writing to the /sys/fs/cgroup//cgroup.procs file. We recommend validating that a process associated with a given PID is the same process before and after moving the PID to cgroup. If the post-validation fails, log an error and consider reverting the cgroup movement.
Long term, we recommend tracking further development of Linux kernel cgroups features or even engaging with the community to produce a race-free method to manage cgroups. A similar effort is currently emerging to provide a race-free way of sending signals to processes via adding a process identifier file descriptor (PIDFD) which would be a proper handle to send signals to processes.
Anything else we need to know?:
See #81146 for current status of all issues created from these findings.
The vendor gave this issue an ID of TOB-K8S-022 and it was finding 4 of the report.
The vendor considers this issue High Severity.
To view the original finding, begin on page 26 of the Kubernetes Security Review Report
Environment:
- Kubernetes version: 1.13.4
Metadata
Metadata
Assignees
Labels
Type
Projects
Status