Skip to content

Conversation

rfranzke
Copy link
Member

How to categorize this PR?

/area scalability
/kind enhancement

What this PR does / why we need it:
This PR exposes the --goaway-chance flag of gardener-apiserver via the Garden API. The flag can help to resolve imbalanced API requests.

Special notes for your reviewer:
FYI @hendrikKahl @voelzmo

Release note:

You can use `.spec.virtualCluster.gardener.gardenerAPIServer.goAwayChance` in the `Garden` API to specify the probability for randomly closing a connection (GOAWAY) in order to prevent HTTP/2 clients from getting stuck on a single `gardener-apiserver`.

@gardener-prow gardener-prow bot added area/scalability Scalability related kind/enhancement Enhancement, improvement, extension labels Feb 28, 2025
@gardener-prow gardener-prow bot requested a review from ary1992 February 28, 2025 14:34
@gardener-prow gardener-prow bot added the cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. label Feb 28, 2025
@gardener-prow gardener-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 28, 2025
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation part looks good to me, however I'm wondering about the following points:

  • How is #11085 related to this change and isn't this rather the intended approach to distribute requests?
  • Why do we expose this flag for Gardener API server instead of the Kube API server where we would cover both, requests for Kubernetes and Gardener APIs?

Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alone might not have an impact on the load distribution of gardener-apiservers.

We often have the situation that there are 3-4 instances of gardener-apiserver which are distributed across 3 zones.

We enable topology aware routing so the endpointslices look like this.

endpoints:
- addresses:
  - 10.36.3.155
  conditions:
    ready: true
    serving: true
    terminating: false
  hints:
    forZones:
    - name: europe-west1-d
  nodeName: node-xyz
  targetRef:
    kind: Pod
    name: gardener-apiserver-5cfbf59cbd-6th95
    namespace: garden
    uid: 7053b996-1024-4ef0-ae2d-53e592ee6a43
  zone: europe-west1-d
- addresses:
  - 10.36.1.111
  conditions:
    ready: true
    serving: true
    terminating: false
  hints:
    forZones:
    - name: europe-west1-c
  nodeName: node-abc
  targetRef:
    kind: Pod
    name: gardener-apiserver-5cfbf59cbd-qsplt
    namespace: garden
    uid: 6903aff0-2f89-4e61-9517-a858d8e63570
  zone: europe-west1-c
- addresses:
  - 10.36.2.180
  conditions:
    ready: true
    serving: true
    terminating: false
  hints:
    forZones:
    - name: europe-west1-b
  nodeName: node-fsk
  targetRef:
    kind: Pod
    name: gardener-apiserver-5cfbf59cbd-9rxqp
    namespace: garden
    uid: a4ef6ff7-2ff2-4159-aa53-996597b1721c
  zone: europe-west1-b

Thus, it is very likely that a clients connects to the same gardener-apiserver pod again after it has been sent away.

If we want to go with the goAwayChance, we should deactivate topology aware routing in this case too.
We could also test the impact of Layer 7 Load Balancing on the load of gardener-apiserver first since PR #11085 has been merged recently.

@voelzmo
Copy link
Member

voelzmo commented Mar 3, 2025

@oliver-goetz

This change alone might not have an impact on the load distribution of gardener-apiservers.

It does. We did several experiments on our landscapes showing that an imbalance we created by killing e.g. 2 of the 3 GAPI pods (causing all virtual-kube-apiservers to connect to the same GAPI) was successfully resolved after a few minutes (depending on the value you end up choosing for the goaway-chance).

Thus, it is very likely that a clients connects to the same gardener-apiserver pod again after it has been sent away.

Can you elaborate a bit on that? I'm curious why – after being told to GOAWAY – a virtual-kube-apiserver would not pick to GAPI in its own AZ, according to the routing hints?
Maybe the missing context here is that the situation we're trying to resolve here is that this imbalance is caused by the virtual-kube-apiservers not connecting to the GAPI in their AZ.

If we want to go with the goAwayChance, we should deactivate topology aware routing in this case too.

I don't agree, we want traffic to stay within a single AZ, if possible.

@timuthy
regarding GOAWAY and balancing the load for the KAPIs in general: In the original issue I tried to elaborate a bit on the fact that we often see single clients creating a huge amount of load (e.g. single controllers going wild) and that the flag wouldn't solve that problem.
You're correct that a load imbalance caused by multiple clients having long-lasting connections to the same KAPI instance would also be resolved by this flag.

Copy link
Member

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!
Should we add some validation for the range of values that are allowed, as it is pretty special for this setting?

@oliver-goetz
Copy link
Member

@oliver-goetz

This change alone might not have an impact on the load distribution of gardener-apiservers.

It does. We did several experiments on our landscapes showing that an imbalance we created by killing e.g. 2 of the 3 GAPI pods (causing all virtual-kube-apiservers to connect to the same GAPI) was successfully resolved after a few minutes (depending on the value you end up choosing for the goaway-chance).

I was not aware about the issue. I thought this should solve an imbalanced load of gardener-apiservers caused by few clients creating many/expensive requests.

Thus, it is very likely that a clients connects to the same gardener-apiserver pod again after it has been sent away.

Can you elaborate a bit on that? I'm curious why – after being told to GOAWAY – a virtual-kube-apiserver would not pick to GAPI in its own AZ, according to the routing hints? Maybe the missing context here is that the situation we're trying to resolve here is that this imbalance is caused by the virtual-kube-apiservers not connecting to the GAPI in their AZ.

No, I can't because that's not what I meant. In case the virtual-kube-apiserver is already connected to a gardener-apiserver in the same zone, it will most likely connect to the sam gardener-apiserver again.

@hendrikKahl
Copy link
Contributor

Let me try to add a bit more context. Configuring a goaway-chance helps to resolve a specific situation where one or two gardener-apiserver pods receive no requests at all, independent of the overall amount or size of requests.

Consider the following scenario:

In a runtime cluster with 3 zones there are 3 replicas of the virtual-garden-kube-apiserver (1 per zone) and 3 replicas of the gardener-apiserver (1 per zone).

Every virtual-garden-kube-apisever pod will connect to the gardener-apiserver running in the same zone because of the endpointslice configuration which aims to keep traffic within a zone.

When a rolling update is triggered for the gardener-apiserver everything works as expected. With new pods getting ready before the previous ones are terminated, traffic stays within a zone. More importantly though, any gardener-apiserver pod will received requests only from one virtual-garden-kube-apiserver pod.

In case of node draining or VPA scaling, the update strategy does not matter though - pods are simply evicted and get recreated. In the worst case, all 3 virtual-garden-kube-apiserver pods talk to a single gardener-apiserver instance. This instance will consume a significant amount of resources (if there are any available), while the other 2 instances are idling to a degree where we could see the readiness probe being the most frequently occurring request.

Currently, there is no way out other than triggering a rolling update. The goaway-chance helps here as it causes connections to be re-established. Eventually, virtual-garden-kube-apiserver pods will reconnect and end up at the "correct" / intended pod.

@rfranzke
Copy link
Member Author

rfranzke commented Mar 4, 2025

Thanks for the discussion! Then I guess we can go ahead with the PR, or are there any remaining concerns? I will address the feedback and hand it off for another review iteration.

@rfranzke rfranzke requested a review from timuthy March 5, 2025 11:01
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2025
Copy link
Contributor

gardener-prow bot commented Mar 5, 2025

LGTM label has been added.

Git tree hash: e0a9bac4b49c0fc15b9e61e8e318fd216d4e718e

Copy link
Contributor

gardener-prow bot commented Mar 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timuthy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2025
@gardener-prow gardener-prow bot merged commit 599e75a into gardener:master Mar 5, 2025
19 checks passed
@rfranzke rfranzke deleted the goaway-chance branch March 5, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/scalability Scalability related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants