[GEP-28] Add support for `NodeAgentAuthorizer` feature to `gardenadm init` #12169

oliver-goetz · 2025-05-22T08:03:44Z

How to categorize this PR?

/area ipcei
/kind enhancement

What this PR does / why we need it:
This PR adds support for the NodeAgentAuthorizer feature gate to gardenadm init.
In gardenadm high touch scenario there are no Machines. Thus, this PR adds adds a new feature to csr-approver controller and node-agent-authorizer webhook that they support a scenario with no machines too.

Now, csr-approver and node-agent-authorizer can be enabled in gardener-resource-manager and gardener-node-agent is able to use client certificates for authorization.

Additionally, this PR introduces a mutate function for the static pod translator (thanks to @rfranzke 😄). With this mutator function we can add a host alias for the service IP of GRM to kube-apiserver. This is required that kube-apiserver is able to reach the node-agent-authorizer webhook.

The initial client certificate for gardener-node-agent needs to be created in the gardenadm init flow before kubelet starts bootstrapping. We use the kubelet bootstrap token for this case which is invalided when kubelet has been bootstrapped successfully. In the shoot scenario gardener-node-agent deploys the kubelet. In gardenadm init flow the node-agent is deployed at the very end.

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:

Release note:

NONE

rfranzke · 2025-05-22T11:01:21Z

/assign

pkg/resourcemanager/apis/config/v1alpha1/types.go

pkg/resourcemanager/apis/config/v1alpha1/validation/validation.go

pkg/resourcemanager/webhook/nodeagentauthorizer/authorizer.go

pkg/gardenadm/botanist/operatingsystemconfig.go

pkg/gardenadm/cmd/init/init.go

pkg/gardenadm/botanist/operatingsystemconfig.go

ScheererJ · 2025-05-23T08:36:04Z

/assign

ScheererJ

Thanks for bringing autonomous shoot clusters closer to ordinary clusters by supporting NodeAgentAuthorizer.

pkg/resourcemanager/webhook/nodeagentauthorizer/authorizer.go

pkg/gardenadm/botanist/controlplane.go

pkg/gardenadm/cmd/init/init.go

pkg/nodeagent/controller/operatingsystemconfig/reconciler.go

See gardener#12169 (comment) for more details - requires temporary cluster-admin permissions since node-agent authorizer is not running yet - GNA now runs before `kubelet` is bootstrapped - this allows to remove the manual kubelet bootstrapping (GNA does it already) - at the end of the flow, the temporary cluster-admin permissions are removed, and it is ensured that GNA is still active/ready In addition, deploying the control plane deployments now also updates the OSC and the Secret reconciled by node-agent.

rfranzke · 2025-06-03T15:22:02Z

/hold (see #12169 (comment))

oliver-goetz

Thanks for looking into this 😄

In general we could continue with this approach. I have a couple of minor remarks though.

pkg/gardenadm/botanist/nodeagent.go

…ut machines

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

This is done by csr-approver now.

This step verifies, that gardener-node-agent is really running and not in a kind of crash loop. Additionally, this ensures that kube-apiserver has been rolled and the node-agent-authorizer webhook is accessible. Otherwise, gardener-node-agent would not be able to start.

Machine name must be set correctly to make this work. Additionally, the last-applied-osc.yaml file must not be written when GNA Reconciler applies the init OSC. Otherwise, it would delete the `machine-name` file right away because it is in the init OSC but not in the original one. While gardener-node-agent is bootstrapping there is no node in the `gardenadm join` scenario. Thus, it requires access to all OSC secrets because node-agent-authorizer does not know which worker pool the new GNA is using. If there would be one OSC per node, we could restrict the access to the OSC secrets which are not used on any node.

See gardener#12169 (comment) for more details - requires temporary cluster-admin permissions since node-agent authorizer is not running yet - GNA now runs before `kubelet` is bootstrapped - this allows to remove the manual kubelet bootstrapping (GNA does it already) - at the end of the flow, the temporary cluster-admin permissions are removed, and it is ensured that GNA is still active/ready In addition, deploying the control plane deployments now also updates the OSC and the Secret reconciled by node-agent.

rfranzke · 2025-06-11T11:40:51Z

/unhold

rfranzke · 2025-06-11T13:27:56Z

/lgtm
/approve

gardener-prow · 2025-06-11T13:28:00Z

LGTM label has been added.

Git tree hash: 77c0c67c095975c9b273140cb40ba85e97247253

gardener-prow · 2025-06-11T13:28:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rfranzke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rfranzke]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

After gardener#12169, there is another restart of the control plane to activate the NodeAgentAuthorizer webhook (this changes configuration in `kube-apiserver`). Additionally, we have to ensure that `gardener-node-agent` is still running afterwards, and this can take some additional time. In some e2e runs, the current `5m` timeout is simply too short, see for example: - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12344/pull-gardener-e2e-kind-gardenadm/1934980463915438080 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12335/pull-gardener-e2e-kind-gardenadm/1934970509573754880 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12318/pull-gardener-e2e-kind-gardenadm/1934964955145048064 or generally https://prow.gardener.cloud/?repo=gardener%2Fgardener&job=pull-gardener-e2e-kind-gardenadm&state=failure, if you are fast enough

After #12169, there is another restart of the control plane to activate the NodeAgentAuthorizer webhook (this changes configuration in `kube-apiserver`). Additionally, we have to ensure that `gardener-node-agent` is still running afterwards, and this can take some additional time. In some e2e runs, the current `5m` timeout is simply too short, see for example: - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12344/pull-gardener-e2e-kind-gardenadm/1934980463915438080 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12335/pull-gardener-e2e-kind-gardenadm/1934970509573754880 - https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12318/pull-gardener-e2e-kind-gardenadm/1934964955145048064 or generally https://prow.gardener.cloud/?repo=gardener%2Fgardener&job=pull-gardener-e2e-kind-gardenadm&state=failure, if you are fast enough

gardener-prow bot added area/ipcei IPCEI (Important Project of Common European Interest) kind/enhancement Enhancement, improvement, extension labels May 22, 2025

gardener-prow bot requested review from Kostov6 and LucaBernstein May 22, 2025 08:03

gardener-prow bot added cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 22, 2025

oliver-goetz force-pushed the enh/autonomous-node-agent-authorizer branch 2 times, most recently from 42986da to 466795a Compare May 22, 2025 08:39

oliver-goetz mentioned this pull request May 22, 2025

☂️ [GEP-28] Autonomous Shoot Clusters #2906

Open

gardener-prow bot assigned rfranzke May 22, 2025

rfranzke requested changes May 22, 2025

View reviewed changes

ScheererJ changed the title ~~Add support for NodeAgentAuthorizer feature to gardenadm init~~ [GEP-28] Add support for NodeAgentAuthorizer feature to gardenadm init May 22, 2025

oliver-goetz requested a review from rfranzke May 22, 2025 17:59

gardener-prow bot assigned ScheererJ May 23, 2025

oliver-goetz force-pushed the enh/autonomous-node-agent-authorizer branch from 1f644c8 to 86b18c8 Compare May 23, 2025 09:05

ScheererJ reviewed May 23, 2025

View reviewed changes

timebertt mentioned this pull request May 23, 2025

[GEP-28] gardenadm bootstrap: Deploy machine-controller-manager #12152

Merged

oliver-goetz force-pushed the enh/autonomous-node-agent-authorizer branch from 156ed78 to ae5735e Compare May 28, 2025 16:26

gardener-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2025

oliver-goetz commented Jun 5, 2025

View reviewed changes

pkg/gardenadm/botanist/nodeagent.go Outdated Show resolved Hide resolved

pkg/gardenadm/botanist/nodeagent.go Outdated Show resolved Hide resolved

pkg/gardenadm/botanist/nodeagent.go Outdated Show resolved Hide resolved

oliver-goetz added 2 commits June 11, 2025 12:47

Make machineNamespaces optional

855d79a

Adapt node-agent-authorizer webhook that it supports a scenario witho…

c797076

…ut machines

oliver-goetz and others added 12 commits June 11, 2025 12:47

Adapt csr-approver that it supports a scenario without machines

e79ab38

Support mutating static pods during translation

c78dbdf

Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>

Support NodeAgentAuthorizer feature gate in gardenadm

6bedb38

Drop manual approval of kubelet server certificate in gardenadm flow

6b7884a

This is done by csr-approver now.

Address PR review feedback

d9ce127

Enable node-agent-authorizer webhook in two steps

08b1360

Address PR review feedback v2

26130af

Address PR review feedback

2c34a80

Adapt unit tests

5be5846

rfranzke force-pushed the enh/autonomous-node-agent-authorizer branch from 07f3ea5 to 5be5846 Compare June 11, 2025 11:40

gardener-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2025

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2025

gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2025

gardener-prow bot merged commit 47ec494 into gardener:master Jun 11, 2025
19 checks passed

oliver-goetz deleted the enh/autonomous-node-agent-authorizer branch June 11, 2025 21:39

rfranzke mentioned this pull request Jun 17, 2025

Increase timeout for gardenadm init to 10m #12346

Merged

[GEP-28] Add support for NodeAgentAuthorizer feature to gardenadm init #12169

[GEP-28] Add support for NodeAgentAuthorizer feature to gardenadm init #12169

Uh oh!

Conversation

oliver-goetz commented May 22, 2025

Uh oh!

rfranzke commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ScheererJ commented May 23, 2025

Uh oh!

ScheererJ left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rfranzke commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-goetz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rfranzke commented Jun 11, 2025

Uh oh!

rfranzke commented Jun 11, 2025

Uh oh!

gardener-prow bot commented Jun 11, 2025

Uh oh!

gardener-prow bot commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

[GEP-28] Add support for `NodeAgentAuthorizer` feature to `gardenadm init` #12169

[GEP-28] Add support for `NodeAgentAuthorizer` feature to `gardenadm init` #12169

rfranzke commented Jun 3, 2025 •

edited

Loading