-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix: node-init restartPods should use docker if /etc/crictl.yaml not found #12894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Commit 9fa556c8f79c4fc8e75cf2771536bdbc17699286 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
Coverage decreased (-0.02%) to 37.11% when pulling cbfb6ba6e43f995ce782854ca7895168a461cbbe on InfiniteEnergy:node-init-docker into bfd972f on cilium:master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your github issue and PR 💯.
I got one question as below.
@@ -55,7 +55,7 @@ spec: | |||
fi | |||
|
|||
echo "Waiting on pods to stop..." | |||
if grep -q 'docker' /etc/crictl.yaml; then | |||
if [ ! -x /etc/crictl.yaml ] || grep -q 'docker' /etc/crictl.yaml; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why -x
is used here?
With your changes, docker will be used for below scenario. However, crictl
should be executed intead if I am not wrong.
root@kind-control-plane:/# ls -lrt /etc/crictl.yaml
-rw-r--r-- 1 root root 56 Jul 8 21:00 /etc/crictl.yaml
root@kind-control-plane:/# cat /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe docker is the default if this hasn't been configured at all so if that file doesn't exist then use docker.
I just checked for EKS and GKE clusters that I have, both are having /etc/crictl.yaml
file and both are using docker.
Might need others to confirm this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why
-x
is used here?
I thought this was the "exists" test; I just double checked and it is "executable" which would be problematic. I've fixed that to use -f
which I think is more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this series of conditions is trying to determine is "What is the container runtime?" It is currently (and I'm extending this some) using /etc/crictl.yaml
as a signal, but this is probably not exactly correct. In a given random installation this config file might even be configured to be elsewhere.
My thinking is to follow the current pattern and try to improve the situation. In most cases if crictl.yaml
explicitly mentions docker or there is no crictl.yaml
config at all then the kubelet is probably using docker since that is the current default (accoding to the docs on --container-runtime
). This change (now fixed with -f
) will capture how AKS behaves (no crictl file) and I don't think will break EKS/GKE where that file does exist.
Other ideas:
- Move this test down to the bottom of the if condition stack
- Have the last two conditions test for the presence of the
docker
binary andcrictl
binary and use whichever one is found - Try to parse kubelet's commandline to look for relevant flags like
--container-runtime
currently defaults to docker which was my thinking on "if this hasn't been configured, use docker"
--config
, and then parse that- I don't really like this option. It will be much harder to implement (error prone) and still doesn't catch that the default value if no flags are specified can change, or be compiled independently to be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just confirmed on the 3 AKS scenarios i know about (from these docs)
- image
AKS Ubuntu 16.04
:- This is the default
/etc/crictl.yaml
doesn't exist, testing its existence works- containers are restarted to get onto CNI net
- cilium connectivity-check succeeds
- image
AKS Ubuntu 18.04
- Built with
--aks-custom-headers "CustomizedUbuntu=aks-ubuntu-1804"
/etc/crictl.yaml
doesn't exist, testing its existence works- containers are restarted to get onto CNI net
- cilium connectivity-check succeeds
- Built with
- image
AKS Ubuntu 18.04
w/ containerd-
built with
--aks-custom-headers "CustomizedUbuntu=aks-ubuntu-1804,ContainerRuntime=containerd"
-
/etc/crictl.yaml
is present; it setsruntime-endpoint: unix:///run/containerd/containerd.sock
-
crictl
binary is not present, docker binary is. This is mentioned in the limitations section and makes probing for which runtime is in use by looking for what binaries are present problematic. -
cilium-node-init logs show restart fails
Restarting kubenet managed pods Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found Node initialization complete bash: line 86: crictl: command not found !!! startup-script succeeded!
-
Other issues with pods failing to start, haven't fully debugged it yet b/c i think that's outside of the scope of this ticket. With
crictl
not present the restart pods technique cilium-node-init is attempting will be harder.
-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed explanation and checking. I think your changes now are definitely good enough for the scope of issue. Docker will be used in either of below cases:
- /etc/crictl.yaml doesn't not exists
- Or
docker
is specifically mentioned in /etc/crictl.yaml.
Ortherwise, critctl
will be used by default.
This script has several tests for what the container runtime situation looks like to determine how best to restart the underlying containers (going around the kubelet) so that the new networking configuration can take effect. The first test looks to see if the crictl config file is configured to use docker, but if that file doesn't exist then it fails. I believe docker is the default if this hasn't been configured at all so if that file doesn't exist then use docker. Fixes #12850 Signed-off-by: Nathan Bird <njbird@infiniteenergy.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for gotten around it!
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a detailed explanation in the comments.
CI passed, mark this one as ready to merge. PS: Add backport for 1.6, 1.7 and 1.8. I made mistake selected backport-done/1.8 😓 grrrr |
This script has several tests for what the container runtime situation
looks like to determine how best to restart the underlying containers
(going around the kubelet) so that the new networking configuration
can take effect.
The first test looks to see if the crictl config file is configured to use
docker, but if that file doesn't exist then it fails. I believe docker
is the default if this hasn't been configured at all so if that file
doesn't exist then use docker.
Fixes #12850
Signed-off-by: Nathan Bird njbird@infiniteenergy.com
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.