-
Notifications
You must be signed in to change notification settings - Fork 526
Support configuring GOAWAY chance for gardener-apiserver
via Garden
API
#11551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation part looks good to me, however I'm wondering about the following points:
- How is #11085 related to this change and isn't this rather the intended approach to distribute requests?
- Why do we expose this flag for Gardener API server instead of the Kube API server where we would cover both, requests for Kubernetes and Gardener APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change alone might not have an impact on the load distribution of gardener-apiserver
s.
We often have the situation that there are 3-4 instances of gardener-apiserver which are distributed across 3 zones.
We enable topology aware routing so the endpointslices
look like this.
endpoints:
- addresses:
- 10.36.3.155
conditions:
ready: true
serving: true
terminating: false
hints:
forZones:
- name: europe-west1-d
nodeName: node-xyz
targetRef:
kind: Pod
name: gardener-apiserver-5cfbf59cbd-6th95
namespace: garden
uid: 7053b996-1024-4ef0-ae2d-53e592ee6a43
zone: europe-west1-d
- addresses:
- 10.36.1.111
conditions:
ready: true
serving: true
terminating: false
hints:
forZones:
- name: europe-west1-c
nodeName: node-abc
targetRef:
kind: Pod
name: gardener-apiserver-5cfbf59cbd-qsplt
namespace: garden
uid: 6903aff0-2f89-4e61-9517-a858d8e63570
zone: europe-west1-c
- addresses:
- 10.36.2.180
conditions:
ready: true
serving: true
terminating: false
hints:
forZones:
- name: europe-west1-b
nodeName: node-fsk
targetRef:
kind: Pod
name: gardener-apiserver-5cfbf59cbd-9rxqp
namespace: garden
uid: a4ef6ff7-2ff2-4159-aa53-996597b1721c
zone: europe-west1-b
Thus, it is very likely that a clients connects to the same gardener-apiserver pod again after it has been sent away.
If we want to go with the goAwayChance, we should deactivate topology aware routing in this case too.
We could also test the impact of Layer 7 Load Balancing on the load of gardener-apiserver first since PR #11085 has been merged recently.
It does. We did several experiments on our landscapes showing that an imbalance we created by killing e.g. 2 of the 3 GAPI pods (causing all virtual-kube-apiservers to connect to the same GAPI) was successfully resolved after a few minutes (depending on the value you end up choosing for the
Can you elaborate a bit on that? I'm curious why – after being told to
I don't agree, we want traffic to stay within a single AZ, if possible. @timuthy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change!
Should we add some validation for the range of values that are allowed, as it is pretty special for this setting?
I was not aware about the issue. I thought this should solve an imbalanced load of gardener-apiservers caused by few clients creating many/expensive requests.
No, I can't because that's not what I meant. In case the virtual-kube-apiserver is already connected to a gardener-apiserver in the same zone, it will most likely connect to the sam gardener-apiserver again. |
Let me try to add a bit more context. Configuring a Consider the following scenario: In a runtime cluster with 3 zones there are 3 replicas of the Every When a rolling update is triggered for the In case of node draining or VPA scaling, the update strategy does not matter though - pods are simply evicted and get recreated. In the worst case, all 3 Currently, there is no way out other than triggering a rolling update. The |
Thanks for the discussion! Then I guess we can go ahead with the PR, or are there any remaining concerns? I will address the feedback and hand it off for another review iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: e0a9bac4b49c0fc15b9e61e8e318fd216d4e718e
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: timuthy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
How to categorize this PR?
/area scalability
/kind enhancement
What this PR does / why we need it:
This PR exposes the
--goaway-chance
flag ofgardener-apiserver
via theGarden
API. The flag can help to resolve imbalanced API requests.Special notes for your reviewer:
FYI @hendrikKahl @voelzmo
Release note: