Use `/livez` endpoint of `kube-proxy` instead of `/healthz` endpoint for probes. #12015

ScheererJ · 2025-05-06T13:45:09Z

How to categorize this PR?

/area networking
/area ops-productivity
/area robustness
/kind enhancement

What this PR does / why we need it:

Use /livez endpoint of kube-proxy instead of /healthz endpoint for probes.

kube-proxy serves two paths on the health endpoint (port 10256) /livez and /healthz. Both check whether kube-proxy is healthy, but the latter also takes the node into account. It reports failure if the node either has a deletion timestamp or a taint by cluster-autoscaler indicating that the node is about to be deleted.
Previously, we used the /healthz endpoint for the readiness probe, which worked fine for indicating that kube-proxy as critical component was ready. However, it could lead to prolonged readiness failures in case a node was about to be deleted by cluster-autoscaler, but some PodDisruptionBudget or terminationGracePeriod delayed the removal. The result was failures in the system components health check. Now, these failures should not be present anymore.

Which issue(s) this PR fixes:
Fixes #11858

Special notes for your reviewer:

kube-proxy health check: https://github.com/kubernetes/kubernetes/blob/288b044e0f3617c7ec3c832edf925bfe7014e392/pkg/proxy/healthcheck/proxy_health.go#L204

Release note:

`kube-proxy` no longer fails its readiness probe in case the node is about to be deleted by `cluster-autoscaler`.

/cc @rfranzke @dguendisch @kon-angelo @LucaBernstein

gardener-prow · 2025-05-06T13:45:15Z

@ScheererJ: GitHub didn't allow me to request PR reviews from the following users: kon-angelo.

Note that only gardener members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

How to categorize this PR?

/area networking
/area ops-productivity
/area robustness
/kind enhancement

What this PR does / why we need it:

Use /livez endpoint of kube-proxy instead of /healthz endpoint for probes.

kube-proxy serves two paths on the health endpoint (port 10256) /livez and /healthz. Both check whether kube-proxy is healthy, but the latter also takes the node into account. It reports failure if the node either has a deletion timestamp or a taint by cluster-autoscaler indicating that the node is about to be deleted.
Previously, we used the /healthz endpoint for the readiness probe, which worked fine for indicating that kube-proxy as critical component was ready. However, it could lead to prolonged readiness failures in case a node was about to be deleted by cluster-autoscaler, but some PodDisruptionBudget or terminationGracePeriod delayed the removal. The result was failures in the system components health check. Now, these failures should not be present anymore.

Which issue(s) this PR fixes:
Fixes #11858

Special notes for your reviewer:

kube-proxy health check: https://github.com/kubernetes/kubernetes/blob/288b044e0f3617c7ec3c832edf925bfe7014e392/pkg/proxy/healthcheck/proxy_health.go#L204

Release note:
`kube-proxy` no longer fails its readiness probe in case the node is about to be deleted by `cluster-autoscaler`.
/cc @rfranzke @dguendisch @kon-angelo @LucaBernstein

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

probes. `kube-proxy` serves two paths on the health endpoint (port 10256) `/livez` and `/healthz`. Both check whether `kube-proxy` is healthy, but the latter also takes the node into account. It reports failure if the node either has a deletion timestamp or a taint by `cluster-autoscaler` indicating that the node is about to be deleted. Previously, we used the `/healthz` endpoint for the readiness probe, which worked fine for indicating that `kube-proxy` as critical component was ready. However, it could lead to prolonged readiness failures in case a node was about to be deleted by `cluster-autoscaler`, but some `PodDisruptionBudget` or `terminationGracePeriod` delayed the removal. The result was failures in the system components health check. Now, these failures should not be present anymore.

tobschli

Nice!
Thank you very much :)
/lgtm

gardener-prow · 2025-05-07T13:59:45Z

LGTM label has been added.

Git tree hash: 8fe4372bb013b719a2c7f43be4c2f107b76fb659

plkokanov

I think /livez was added when /healthz behaviour was modified in kubernetes/kubernetes#116470, which seems to be from k8s 1.28.0
Should we keep the old probe for k8s 1.27.0?

ScheererJ · 2025-05-08T10:57:04Z

I think /livez was added when /healthz behaviour was modified in kubernetes/kubernetes#116470, which seems to be from k8s 1.28.0 Should we keep the old probe for k8s 1.27.0?

I added logic to keep it as it was before for 1.27.x and only use the new endpoint starting with 1.28.0.

PTAL.

plkokanov · 2025-05-08T12:14:59Z

The pull-gardener-e2e-kind-ha-multi-zone failed due to a timeout while waiting for the shoot's system components to become healthy. One of the reasons for that was that the kube-proxy daemonsets did not become healthy quickly enough:

  2025-05-08T11:23:47.611Z	INFO	Waiting for reconciliation and healthiness	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}, "lastOperation": "&LastOperation{Description:Labeling resources after modification of encryption config or to encrypt them with new ETCD encryption key, Waiting until shoot worker nodes have been reconciled,LastUpdateTime:2025-05-08 11:23:42 +0000 UTC,Progress:85,State:Processing,Type:Reconcile,}", "reason": "condition type SystemComponentsHealthy is not true yet, had message DaemonSet \"kube-system/kube-proxy-manual-v1.32.0\" is unhealthy: too many unavailable pods found (1/1) with reason DaemonSetUnhealthy"}
  2025-05-08T11:24:17.620Z	INFO	Waiting for reconciliation and healthiness	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}, "lastOperation": "&LastOperation{Description:Waiting until shoot worker nodes have been reconciled,LastUpdateTime:2025-05-08 11:24:12 +0000 UTC,Progress:88,State:Processing,Type:Reconcile,}", "reason": "condition type SystemComponentsHealthy is not true yet, had message DaemonSet \"kube-system/kube-proxy-auto-v1.32.0\" is unhealthy: too many unavailable pods found (1/2) with reason DaemonSetUnhealthy"}
  2025-05-08T11:24:47.627Z	INFO	Waiting for reconciliation and healthiness	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}, "lastOperation": "&LastOperation{Description:Waiting until shoot worker nodes have been reconciled,LastUpdateTime:2025-05-08 11:24:12 +0000 UTC,Progress:88,State:Processing,Type:Reconcile,}", "reason": "condition type SystemComponentsHealthy is not true yet, had message DaemonSet \"kube-system/kube-proxy-auto-v1.32.0\" is unhealthy: too many unavailable pods found (1/2) with reason DaemonSetUnhealthy"}
  2025-05-08T11:25:17.640Z	INFO	Waiting for reconciliation and healthiness	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}, "lastOperation": "&LastOperation{Description:Waiting until shoot worker nodes have been reconciled,LastUpdateTime:2025-05-08 11:24:12 +0000 UTC,Progress:88,State:Processing,Type:Reconcile,}", "reason": "condition type SystemComponentsHealthy is not true yet, had message Deployment \"kube-system/metrics-server\" is unhealthy: condition \"Available\" has invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment does not have minimum availability. with reason DeploymentUnhealthy"}
  2025-05-08T11:25:47.658Z	INFO	Waiting for reconciliation and healthiness	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}, "lastOperation": "&LastOperation{Description:Waiting until operating system configurations for worker nodes have been reconciled,LastUpdateTime:2025-05-08 11:25:44 +0000 UTC,Progress:70,State:Processing,Type:Reconcile,}", "reason": "condition type SystemComponentsHealthy is not true yet, had message DaemonSet \"kube-system/kube-proxy-auto-v1.32.0\" is unhealthy: too many unavailable pods found (1/2) with reason DaemonSetUnhealthy"}
  2025-05-08T11:26:17.664Z	INFO	Waiting for reconciliation and healthiness	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}, "lastOperation": "&LastOperation{Description:Shoot cluster has been successfully reconciled.,LastUpdateTime:2025-05-08 11:26:05 +0000 UTC,Progress:100,State:Succeeded,Type:Reconcile,}", "reason": "condition type SystemComponentsHealthy is not true yet, had message DaemonSet \"kube-system/kube-proxy-manual-v1.32.0\" is unhealthy: too many unavailable pods found (1/1) with reason DaemonSetUnhealthy"}
  2025-05-08T11:26:47.682Z	INFO	Shoot has been reconciled and is healthy	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}}
  < Exit �[1m[It]�[0m Wait for Shoot to be reconciled �[38;5;243m@ 05/08/25 11:26:47.682 (10m30.389s)�[0m
  < Exit �[1m[It]�[0m Wait for Shoot to be reconciled �[38;5;243m@ 05/08/25 11:26:47.682 (10m30.389s)�[0m
  �[38;5;243m<< Timeline�[0m
�[38;5;243m------------------------------�[0m
�[38;5;9m• [FAILED] [0.001 seconds]�[0m
�[0mShoot Tests �[38;5;243mCreate Shoot, Rotate Credentials and Delete Shoot �[0mShoot with workers �[38;5;243mwith workers rollout, in-place update strategy �[38;5;9m�[1m[It] ca rotation phase should be 'Prepared'�[0m �[38;5;204m[Shoot, default, credentials-rotation, basic, with-workers-rollout, in-place]�[0m
�[38;5;243m/home/prow/go/src/github.com/gardener/gardener/test/e2e/gardener/shoot/internal/rotation/certificate_authorities.go:148�[0m

  �[38;5;243mTimeline >>�[0m
  > Enter �[1m[BeforeEach]�[0m TOP-LEVEL �[38;5;243m@ 05/08/25 11:26:47.683�[0m
  < Exit �[1m[BeforeEach]�[0m TOP-LEVEL �[38;5;243m@ 05/08/25 11:26:47.683 (0s)�[0m
  > Enter �[1m[It]�[0m ca rotation phase should be 'Prepared' �[38;5;243m@ 05/08/25 11:26:47.683�[0m
  �[38;5;9m[FAILED]�[0m in [It] - /home/prow/go/src/github.com/gardener/gardener/test/e2e/gardener/shoot/internal/rotation/certificate_authorities.go:149 �[38;5;243m@ 05/08/25 11:26:47.683�[0m
  < Exit �[1m[It]�[0m ca rotation phase should be 'Prepared' �[38;5;243m@ 05/08/25 11:26:47.683 (0s)�[0m
  �[38;5;243m<< Timeline�[0m

  �[38;5;9m[FAILED] Expected
      <v1beta1.CredentialsRotationPhase>: Preparing
  to equal
      <v1beta1.CredentialsRotationPhase>: Prepared�[0m

@ScheererJ could you check if it's somehow related to this PR (I don't immediately see how) or just a flake?

ScheererJ · 2025-05-08T12:47:01Z

@ScheererJ could you check if it's somehow related to this PR (I don't immediately see how) or just a flake?

As the test reported the following before failing, I think the reconciliation succeeded. Therefore, I think this is just a flake.

  2025-05-08T11:26:47.682Z	INFO	Shoot has been reconciled and is healthy	{"shoot": {"name":"e2e-rot-ip","namespace":"garden-local"}}

plkokanov

/lgtm
/approve

gardener-prow · 2025-05-08T12:49:39Z

LGTM label has been added.

Git tree hash: 02cfa0ed4570a9c68fd82ee2264e0c36a7a60c8c

gardener-prow · 2025-05-08T12:49:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: plkokanov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [plkokanov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ialidzhikov · 2025-05-08T14:17:01Z

Isn't there a /readyz endpoint of kube-proxy? My feeling so far was that /livez gets used for liveness probe or /readyz for readiness probe.

ScheererJ · 2025-05-08T14:22:51Z

Isn't there a /readyz endpoint of kube-proxy? My feeling so far was that /livez gets used for liveness probe or /ready for readiness probe.

As far as I can tell from the kube-proxy source code, there does not seem to be any /ready//readyz endpoint. Previously, there was only /healthz, but with 1.28.0 /livez was added.

plkokanov · 2025-05-08T14:37:47Z

Isn't there a /readyz endpoint of kube-proxy? My feeling so far was that /livez gets used for liveness probe or /ready for readiness probe.

As far as I can tell from the kube-proxy source code, there does not seem to be any /ready//readyz endpoint. Previously, there was only /healthz, but with 1.28.0 /livez was added.

Additionally, according to the corresponding KEP the /livez endpoint was added to replace the old /healthz semantics 😅

ialidzhikov · 2025-05-08T14:52:27Z

Thanks for explaining! I just found confusing that /livez endpoint is used for readiness probe and there is no liveness probe and no /readyz endpoint. But looks like the confusion is caused by how the things are designed and implemented upstream. 🤷

timuthy · 2025-05-12T08:20:40Z

@ScheererJ the change makes me wonder why we think it's fine to work around the improvement proposed and implemented in the mentioned KEP, esp. see motivation.

Can you share more information?

Have you considered ignoring unready kube-proxy pods for to-be-deleted nodes, instead?

ScheererJ · 2025-05-14T13:46:53Z

@ScheererJ the change makes me wonder why we think it's fine to work around the improvement proposed and implemented in the mentioned KEP, esp. see motivation.

Can you share more information?

Have you considered ignoring unready kube-proxy pods for to-be-deleted nodes, instead?

We do not work around the improvement of KEP-3836 at all. Cloud controller managers can still use the /healthz endpoint of kube-proxy and they will get a failing result if the node is to be terminated. Therefore, infrastructures that support this will be able to drain connections before cluster-autoscaler will remove the node. (As far as I can tell, only GCP supports this.)

What we do different with this pull request is only changing the semantic of the readiness probe of the kube-proxy pods, which is not used by cloud-controller-manager anyway. We introduced the readiness probe so that the critical component check Gardener performs on node startup does something useful for kube-proxy so that failing startup of kube-proxy can prevent a node to reach the critical components ready stage.

It is indeed possible instead of adjusting the readiness probe to adapt the Gardener logic handling the health check. However, if you think it through there is a lot of specific logic required. If a non-ready Pod is encountered we need to check if it is kube-proxy and if so we would need to check the corresponding node for the cluster-autoscaler taint.

This was always the backup option if there was no other solution, but it is tedious and hard to understand later. The special handling also makes the health checks less clear. Therefore, I prefer the current solution.

Feel free to share your point of view if there is something obvious I missed.

timuthy · 2025-05-14T20:03:41Z

Thanks @ScheererJ for the detailed information. I'm fine with the current solution as you could clarify KEP-3836 is still valid 🙂

gardener-prow bot requested a review from rfranzke May 6, 2025 13:45

gardener-prow bot added the area/networking Networking related label May 6, 2025

gardener-prow bot requested a review from dguendisch May 6, 2025 13:45

gardener-prow bot added the area/ops-productivity Operator productivity related (how to improve operations) label May 6, 2025

gardener-prow bot requested a review from LucaBernstein May 6, 2025 13:45

gardener-prow bot added the area/robustness Robustness, reliability, resilience related label May 6, 2025

gardener-prow bot added kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 6, 2025

ScheererJ force-pushed the fix/kube-proxy-readiness-with-cluster-autoscaler branch from 8994042 to 27ccc87 Compare May 7, 2025 07:01

tobschli reviewed May 7, 2025

View reviewed changes

gardener-prow bot assigned tobschli May 7, 2025

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2025

plkokanov reviewed May 7, 2025

View reviewed changes

Address Review Feedback

1d505a5

gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2025

gardener-prow bot requested a review from tobschli May 8, 2025 10:56

gardener-prow bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 8, 2025

plkokanov reviewed May 8, 2025

View reviewed changes

gardener-prow bot assigned plkokanov May 8, 2025

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2025

gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2025

gardener-prow bot merged commit ceebdbf into gardener:master May 9, 2025
19 checks passed

Use /livez endpoint of kube-proxy instead of /healthz endpoint for probes. #12015

Use /livez endpoint of kube-proxy instead of /healthz endpoint for probes. #12015

Uh oh!

Conversation

ScheererJ commented May 6, 2025

Uh oh!

gardener-prow bot commented May 6, 2025

Uh oh!

tobschli left a comment

Choose a reason for hiding this comment

Uh oh!

gardener-prow bot commented May 7, 2025

Uh oh!

plkokanov left a comment

Choose a reason for hiding this comment

Uh oh!

ScheererJ commented May 8, 2025

Uh oh!

plkokanov commented May 8, 2025

Uh oh!

ScheererJ commented May 8, 2025

Uh oh!

plkokanov left a comment

Choose a reason for hiding this comment

Uh oh!

gardener-prow bot commented May 8, 2025

Uh oh!

gardener-prow bot commented May 8, 2025

Uh oh!

ialidzhikov commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScheererJ commented May 8, 2025

Uh oh!

plkokanov commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ialidzhikov commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

timuthy commented May 12, 2025

Uh oh!

ScheererJ commented May 14, 2025

Uh oh!

timuthy commented May 14, 2025

Uh oh!

Uh oh!

Use `/livez` endpoint of `kube-proxy` instead of `/healthz` endpoint for probes. #12015

Use `/livez` endpoint of `kube-proxy` instead of `/healthz` endpoint for probes. #12015

ialidzhikov commented May 8, 2025 •

edited

Loading

plkokanov commented May 8, 2025 •

edited

Loading

ialidzhikov commented May 8, 2025 •

edited

Loading