-
Notifications
You must be signed in to change notification settings - Fork 41.3k
kubeadm: add a upgrade health check that deploys a Job #81319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: add a upgrade health check that deploys a Job #81319
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: neolit123 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @neolit123 !
I am not sure if this is going to pull us through. What if the pods are in a crash loop?
They may be reported as all running, but restart in a few seconds. Hence, the cluster won't be operational.
In fact, what is an operational cluster? If all of the control plane pods are running, but there is no CNI plugin installed, the cluster is still as good as useless.
The closest we can get to answering the original question is to use the liveness probe where available (but that may pose its own set of security risks if not done properly).
Having a check, that spans over more than the API server and etcd, means that a misconfigured scheduler, controller manager, proxy, etc. will be more difficult to fix, than just executing kubeadm upgrade apply --config=fixed-config.yaml
we can hold for N seconds and see if the Pods are still running after this period, but then the crash loop period is unknown, so this is yet another arbitrary constant.
IMO, CNI demands for separate check that ensures that the Pods have networking.
then again if the liveness probes are working as expected the Pods should be in a Running state.
yet, i must admit my biggest concern is not that this change is checking a Running state, but rather that it's checking the Running state on all CP Pods, which can break potential parallel CP node upgrades. @rosti do you have a proposal on how to better approach the security audit ticket or are your comments suggesting that you do not agree with it and we should take no action? |
/retest |
if this PR merges we can do a health check on the local components /health* endpoints: |
That sounds a bit like reimplementing the liveness probes in a different way.
That's my point. We should see what we need to check and what not. Obviously, we cannot ensure a fully operational cluster and it's not our job to verify that.
If a liveness probe fails, it will restart the container in question according to the restart policy. The pod will be in running state if there is at least one container, that's starting, running or restarting.
+1 to that.
Indeed, only a single instance of every component is required at a minimum, to claim that we are operational.
Actually I deliberated quite a lot on this one and I couldn't find a solution, that seemed reliable enough to me. |
this PR should be closed as the approach is not ideal, but instead i'm going to refactor it to use another approach as discussed during the office hours. |
6064dca
to
c1160de
Compare
@rosti @kubernetes/sig-cluster-lifecycle-pr-reviews |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @neolit123 !
I don't like the idea of using a daemon set, a job would be much better in my opinion. We just want to verify that we can start something on the cluster.
A simple Job, like this one, should do the trick:
apiVersion: batch/v1
kind: Job
metadata:
name: health-check
namespace: kube-system
spec:
backoffLimit: 0
activeDeadlineSeconds: 30
template:
spec:
restartPolicy: Never
terminationGracePeriodSeconds: 0
securityContext:
runAsUser: 999
runAsGroup: 999
runAsNonRoot: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: health-check
image: k8s.gcr.io/pause:3.1
args: ["-v"]
We may also want to have the health checks with a random suffix, to ensure name uniqueness.
/hold
wish you expressed your comments during our discussion with @fabriziopandini during the office hours, last week (was it). |
@neolit123 we can go with a daemon set for 1.16, but let's copy over the spec and s/prepull/health-check over it. |
i will try to experiment with Job later this week. /remove-kind feature |
c1160de
to
d777c91
Compare
updated to use a Job for the preflight check. |
2adc722
to
29bad3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @neolit123 !
Looks good and I have just non-blocking nits.
I would have used the TTL controller (even though it's still in alpha), but it's fine like this too.
/lgtm
- Add a new preflight check for upgrade that runs the pause container with -v in a Job. - Wait for the Job to complete and return an error after N seconds. - Manually clean the Job because we don't have the TTL controller enabled in kubeadm yet (it's still alpha).
29bad3c
to
906d315
Compare
|
||
// If client.Discovery().RESTClient() is nil, the fake client is used, and that means we are dry-running. Just proceed | ||
// If client.Discovery().RESTClient() is nil, the fake client is used. | ||
// Return early because the kubeadm dryrun dynamic client only handles the core/v1 GroupVersion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to add this.
the dynamic client fails when trying to get batch/v1 Job here:
unstructuredObj, err := clg.dynamicClient.Resource(action.GetResource()).Namespace(action.GetNamespace()).Get(action.GetName(), metav1.GetOptions{}) |
i think we should stop using the dynamic client for dryrun, but this is a much larger refactor.
until then we have to skip the logic in this preflight check on dryrun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 we can have a backlog issue for this so an eager contributor can try it.
@rosti updated. see latest comment above. |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @neolit123 !
/lgtm
|
||
// If client.Discovery().RESTClient() is nil, the fake client is used, and that means we are dry-running. Just proceed | ||
// If client.Discovery().RESTClient() is nil, the fake client is used. | ||
// Return early because the kubeadm dryrun dynamic client only handles the core/v1 GroupVersion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 we can have a backlog issue for this so an eager contributor can try it.
I logged the issue, a couple of days ago. I think.
|
Getting weird issue related to this health-check : FATAL: [preflight] Some fatal errors occurred: |
What this PR does / why we need it:
with -v in a Job.
enabled in kubeadm yet (it's still alpha).
Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#1717
Special notes for your reviewer:
"fixes" a security audit item, although not super critical as the upgrade process will catch further problems in failing components.
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
@kubernetes/sig-cluster-lifecycle-pr-reviews
/kind feature
/kind cleanup
/area security
/priority backlog