Skip to content

Conversation

oliver-goetz
Copy link
Member

How to categorize this PR?

/area ipcei
/kind enhancement

What this PR does / why we need it:
This PR adds support for the NodeAgentAuthorizer feature gate to gardenadm init.
In gardenadm high touch scenario there are no Machines. Thus, this PR adds adds a new feature to csr-approver controller and node-agent-authorizer webhook that they support a scenario with no machines too.

Now, csr-approver and node-agent-authorizer can be enabled in gardener-resource-manager and gardener-node-agent is able to use client certificates for authorization.

Additionally, this PR introduces a mutate function for the static pod translator (thanks to @rfranzke 😄). With this mutator function we can add a host alias for the service IP of GRM to kube-apiserver. This is required that kube-apiserver is able to reach the node-agent-authorizer webhook.

The initial client certificate for gardener-node-agent needs to be created in the gardenadm init flow before kubelet starts bootstrapping. We use the kubelet bootstrap token for this case which is invalided when kubelet has been bootstrapped successfully. In the shoot scenario gardener-node-agent deploys the kubelet. In gardenadm init flow the node-agent is deployed at the very end.

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:

Release note:

NONE

@gardener-prow gardener-prow bot added area/ipcei IPCEI (Important Project of Common European Interest) kind/enhancement Enhancement, improvement, extension labels May 22, 2025
@gardener-prow gardener-prow bot requested review from Kostov6 and LucaBernstein May 22, 2025 08:03
@gardener-prow gardener-prow bot added cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 22, 2025
@oliver-goetz oliver-goetz force-pushed the enh/autonomous-node-agent-authorizer branch 2 times, most recently from 42986da to 466795a Compare May 22, 2025 08:39
@gardener-prow gardener-prow bot added cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. and removed cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. labels May 22, 2025
@rfranzke
Copy link
Member

/assign

@ScheererJ ScheererJ changed the title Add support for NodeAgentAuthorizer feature to gardenadm init [GEP-28] Add support for NodeAgentAuthorizer feature to gardenadm init May 22, 2025
@oliver-goetz oliver-goetz requested a review from rfranzke May 22, 2025 17:59
@ScheererJ
Copy link
Member

/assign

@oliver-goetz oliver-goetz force-pushed the enh/autonomous-node-agent-authorizer branch from 1f644c8 to 86b18c8 Compare May 23, 2025 09:05
Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing autonomous shoot clusters closer to ordinary clusters by supporting NodeAgentAuthorizer.

@oliver-goetz oliver-goetz force-pushed the enh/autonomous-node-agent-authorizer branch from 156ed78 to ae5735e Compare May 28, 2025 16:26
rfranzke added a commit to oliver-goetz/gardener that referenced this pull request Jun 3, 2025
See gardener#12169 (comment) for more details

- requires temporary cluster-admin permissions since node-agent
  authorizer is not running yet
- GNA now runs before `kubelet` is bootstrapped
- this allows to remove the manual kubelet bootstrapping (GNA does it
  already)
- at the end of the flow, the temporary cluster-admin permissions are
  removed, and it is ensured that GNA is still active/ready

In addition, deploying the control plane deployments now also updates
the OSC and the Secret reconciled by node-agent.
@rfranzke
Copy link
Member

rfranzke commented Jun 3, 2025

/hold (see #12169 (comment))

@gardener-prow gardener-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2025
Copy link
Member Author

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this 😄

In general we could continue with this approach. I have a couple of minor remarks though.

oliver-goetz and others added 12 commits June 11, 2025 12:47
Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>
This step verifies, that gardener-node-agent is really running and not in a kind of crash loop.
Additionally, this ensures that kube-apiserver has been rolled and the node-agent-authorizer webhook is accessible. Otherwise, gardener-node-agent would not be able to start.
Machine name must be set correctly to make this work. Additionally, the last-applied-osc.yaml file must not be written when GNA Reconciler applies the init OSC. Otherwise, it would delete the `machine-name` file right away because it is in the init OSC but not in the original one.
While gardener-node-agent is bootstrapping there is no node in the `gardenadm join` scenario. Thus, it requires access to all OSC secrets because node-agent-authorizer does not know which worker pool the new GNA is using. If there would be one OSC per node, we could restrict the access to the OSC secrets which are not used on any node.
See gardener#12169 (comment) for more details

- requires temporary cluster-admin permissions since node-agent
  authorizer is not running yet
- GNA now runs before `kubelet` is bootstrapped
- this allows to remove the manual kubelet bootstrapping (GNA does it
  already)
- at the end of the flow, the temporary cluster-admin permissions are
  removed, and it is ensured that GNA is still active/ready

In addition, deploying the control plane deployments now also updates
the OSC and the Secret reconciled by node-agent.
@rfranzke rfranzke force-pushed the enh/autonomous-node-agent-authorizer branch from 07f3ea5 to 5be5846 Compare June 11, 2025 11:40
@rfranzke
Copy link
Member

/unhold

@gardener-prow gardener-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2025
@rfranzke
Copy link
Member

/lgtm
/approve

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2025
Copy link
Contributor

gardener-prow bot commented Jun 11, 2025

LGTM label has been added.

Git tree hash: 77c0c67c095975c9b273140cb40ba85e97247253

Copy link
Contributor

gardener-prow bot commented Jun 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rfranzke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2025
@gardener-prow gardener-prow bot merged commit 47ec494 into gardener:master Jun 11, 2025
19 checks passed
@oliver-goetz oliver-goetz deleted the enh/autonomous-node-agent-authorizer branch June 11, 2025 21:39
rfranzke added a commit to rfranzke/gardener that referenced this pull request Jun 17, 2025
After gardener#12169, there is another
restart of the control plane to activate the NodeAgentAuthorizer webhook
(this changes configuration in `kube-apiserver`). Additionally, we have
to ensure that `gardener-node-agent` is still running afterwards, and
this can take some additional time.

In some e2e runs, the current `5m` timeout is simply too short, see for
example:

- https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12344/pull-gardener-e2e-kind-gardenadm/1934980463915438080
- https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12335/pull-gardener-e2e-kind-gardenadm/1934970509573754880
- https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12318/pull-gardener-e2e-kind-gardenadm/1934964955145048064

or generally
https://prow.gardener.cloud/?repo=gardener%2Fgardener&job=pull-gardener-e2e-kind-gardenadm&state=failure,
if you are fast enough
gardener-prow bot pushed a commit that referenced this pull request Jun 18, 2025
After #12169, there is another
restart of the control plane to activate the NodeAgentAuthorizer webhook
(this changes configuration in `kube-apiserver`). Additionally, we have
to ensure that `gardener-node-agent` is still running afterwards, and
this can take some additional time.

In some e2e runs, the current `5m` timeout is simply too short, see for
example:

- https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12344/pull-gardener-e2e-kind-gardenadm/1934980463915438080
- https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12335/pull-gardener-e2e-kind-gardenadm/1934970509573754880
- https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12318/pull-gardener-e2e-kind-gardenadm/1934964955145048064

or generally
https://prow.gardener.cloud/?repo=gardener%2Fgardener&job=pull-gardener-e2e-kind-gardenadm&state=failure,
if you are fast enough
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipcei IPCEI (Important Project of Common European Interest) cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants