-
Notifications
You must be signed in to change notification settings - Fork 525
[GEP-28] Add support for NodeAgentAuthorizer
feature to gardenadm init
#12169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GEP-28] Add support for NodeAgentAuthorizer
feature to gardenadm init
#12169
Conversation
42986da
to
466795a
Compare
/assign |
NodeAgentAuthorizer
feature to gardenadm init
NodeAgentAuthorizer
feature to gardenadm init
/assign |
1f644c8
to
86b18c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing autonomous shoot clusters closer to ordinary clusters by supporting NodeAgentAuthorizer
.
156ed78
to
ae5735e
Compare
See gardener#12169 (comment) for more details - requires temporary cluster-admin permissions since node-agent authorizer is not running yet - GNA now runs before `kubelet` is bootstrapped - this allows to remove the manual kubelet bootstrapping (GNA does it already) - at the end of the flow, the temporary cluster-admin permissions are removed, and it is ensured that GNA is still active/ready In addition, deploying the control plane deployments now also updates the OSC and the Secret reconciled by node-agent.
/hold (see #12169 (comment)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this 😄
In general we could continue with this approach. I have a couple of minor remarks though.
Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>
This is done by csr-approver now.
This step verifies, that gardener-node-agent is really running and not in a kind of crash loop. Additionally, this ensures that kube-apiserver has been rolled and the node-agent-authorizer webhook is accessible. Otherwise, gardener-node-agent would not be able to start.
Machine name must be set correctly to make this work. Additionally, the last-applied-osc.yaml file must not be written when GNA Reconciler applies the init OSC. Otherwise, it would delete the `machine-name` file right away because it is in the init OSC but not in the original one. While gardener-node-agent is bootstrapping there is no node in the `gardenadm join` scenario. Thus, it requires access to all OSC secrets because node-agent-authorizer does not know which worker pool the new GNA is using. If there would be one OSC per node, we could restrict the access to the OSC secrets which are not used on any node.
See gardener#12169 (comment) for more details - requires temporary cluster-admin permissions since node-agent authorizer is not running yet - GNA now runs before `kubelet` is bootstrapped - this allows to remove the manual kubelet bootstrapping (GNA does it already) - at the end of the flow, the temporary cluster-admin permissions are removed, and it is ensured that GNA is still active/ready In addition, deploying the control plane deployments now also updates the OSC and the Secret reconciled by node-agent.
07f3ea5
to
5be5846
Compare
/unhold |
/lgtm |
LGTM label has been added. Git tree hash: 77c0c67c095975c9b273140cb40ba85e97247253
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rfranzke The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
After gardener#12169, there is another restart of the control plane to activate the NodeAgentAuthorizer webhook (this changes configuration in `kube-apiserver`). Additionally, we have to ensure that `gardener-node-agent` is still running afterwards, and this can take some additional time. In some e2e runs, the current `5m` timeout is simply too short, see for example: - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12344/pull-gardener-e2e-kind-gardenadm/1934980463915438080 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12335/pull-gardener-e2e-kind-gardenadm/1934970509573754880 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12318/pull-gardener-e2e-kind-gardenadm/1934964955145048064 or generally https://prow.gardener.cloud/?repo=gardener%2Fgardener&job=pull-gardener-e2e-kind-gardenadm&state=failure, if you are fast enough
After #12169, there is another restart of the control plane to activate the NodeAgentAuthorizer webhook (this changes configuration in `kube-apiserver`). Additionally, we have to ensure that `gardener-node-agent` is still running afterwards, and this can take some additional time. In some e2e runs, the current `5m` timeout is simply too short, see for example: - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12344/pull-gardener-e2e-kind-gardenadm/1934980463915438080 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12335/pull-gardener-e2e-kind-gardenadm/1934970509573754880 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12318/pull-gardener-e2e-kind-gardenadm/1934964955145048064 or generally https://prow.gardener.cloud/?repo=gardener%2Fgardener&job=pull-gardener-e2e-kind-gardenadm&state=failure, if you are fast enough
How to categorize this PR?
/area ipcei
/kind enhancement
What this PR does / why we need it:
This PR adds support for the
NodeAgentAuthorizer
feature gate togardenadm init
.In gardenadm high touch scenario there are no
Machine
s. Thus, this PR adds adds a new feature tocsr-approver
controller andnode-agent-authorizer
webhook that they support a scenario with no machines too.Now,
csr-approver
andnode-agent-authorizer
can be enabled ingardener-resource-manager
andgardener-node-agent
is able to use client certificates for authorization.Additionally, this PR introduces a mutate function for the static pod translator (thanks to @rfranzke 😄). With this mutator function we can add a host alias for the service IP of GRM to kube-apiserver. This is required that kube-apiserver is able to reach the node-agent-authorizer webhook.
The initial client certificate for
gardener-node-agent
needs to be created in the gardenadm init flow before kubelet starts bootstrapping. We use the kubelet bootstrap token for this case which is invalided when kubelet has been bootstrapped successfully. In the shoot scenario gardener-node-agent deploys the kubelet. In gardenadm init flow the node-agent is deployed at the very end.Which issue(s) this PR fixes:
Part of #2906
Special notes for your reviewer:
Release note: