Skip to content

Conversation

rfranzke
Copy link
Member

How to categorize this PR?

/area ops-productivity dev-productivity
/kind enhancement

What this PR does / why we need it:
This PR integrates gardener-node-agent into gardenlet's Shoot controller. Health checks in its shoot-care reconciler are adapted as well.

The overall flow is similar to before with cloud-config-downloader: gardenlet generates a ManagedResource which has multiple Secret references:

  • one for the RBAC resources needed for gardener-node-agent
  • one for each worker pool containing the OperatingSystemConfig secret that is later reconciled by gardener-node-agent

When gardener-node-agent was successfully rolled out, all leftover files of cloud-config-downloader are cleaned up. For existing nodes, cloud-config-downloader fetches the updated OperatingSystemConfig (which now contains gardener-node-init and gardener-node-agent units), and it deletes its own systemd unit file (gardener-node-agent must just clean a few leftovers):

Nov 21 17:11:15 machine-shoot--local--local-local-bfc8c-bsxtt systemd[1]: cloud-config-downloader.service: Scheduled restart job, restart counter is at 14.
Nov 21 17:11:15 machine-shoot--local--local-local-bfc8c-bsxtt systemd[1]: Stopped Downloads the actual cloud config from the Shoot API server and executes it.
Nov 21 17:11:15 machine-shoot--local--local-local-bfc8c-bsxtt systemd[1]: Started Downloads the actual cloud config from the Shoot API server and executes it.
Nov 21 17:11:15 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[21731]: Checksum of cloud config script has changed compared to what I had downloaded earlier (new: 4285af521e03bfc8b60ffcc2a0d90a7fb09c61f0ef2da0c11a33bf4faea6c994, old: 50a08949422791d415d6d655534ee622840b9d3344bb7dedf722c49a3dae53e5). Fetching new script...
Nov 21 17:11:15 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[21767]: Checking whether we need to preload a new hyperkube image...
Nov 21 17:11:15 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[21767]: No need to preload new hyperkube image because binaries for eu.gcr.io/gardener-project/hyperkube:v1.28.2 were found in /var/lib/cloud-config-downloader/downloads/hyperkube
Nov 21 17:11:15 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[21767]: Seen newer cloud config or cloud config downloader version or hyperkube image
[...]
Nov 21 17:11:18 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22302]: Created symlink /etc/systemd/system/multi-user.target.wants/gardener-node-agent.service → /etc/systemd/system/gardener-node-agent.service.
Nov 21 17:11:18 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22331]: Created symlink /etc/systemd/system/multi-user.target.wants/gardener-node-init.service → /etc/systemd/system/gardener-node-init.service.
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[21767]: Successfully restarted all units referenced in the cloud config.
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22601]: removed '/etc/systemd/system/cloud-config-downloader.service'
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22607]: removed '/var/lib/cloud-config-downloader/credentials/ca.crt'
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22608]: removed '/var/lib/cloud-config-downloader/credentials/server'
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22609]: removed '/var/lib/cloud-config-downloader/download-cloud-config.sh'
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22610]: removed '/var/lib/valitail/scripts/fetch-token.sh'
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[21767]: Cloud config is up to date.
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22613]: node/machine-shoot--local--local-local-bfc8c-bsxtt not labeled
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt download-cloud-config.sh[22651]: node/machine-shoot--local--local-local-bfc8c-bsxtt annotated
Nov 21 17:11:19 machine-shoot--local--local-local-bfc8c-bsxtt systemd[1]: cloud-config-downloader.service: Succeeded.
Nov 21 17:11:20 machine-shoot--local--local-local-bfc8c-bsxtt systemd[1]: cloud-config-downloader.service: Failed to schedule restart job: Unit cloud-config-downloader.service not found.
Nov 21 17:11:20 machine-shoot--local--local-local-bfc8c-bsxtt systemd[1]: cloud-config-downloader.service: Failed with result 'resources'.

The PR also adds handling for the kubelet's bootstrap kubeconfig generation/cleanup into gardener-node-agent.

Which issue(s) this PR fixes:
Part of #8023

Special notes for your reviewer:
/cc @oliver-goetz @ScheererJ

Release note:

The `UseGardenerNodeAgent` feature gate is now enabled for the local development scenario. You can read more about `gardener-node-agent` [here](https://github.com/gardener/gardener/blob/master/docs/concepts/node-agent.md).

@gardener-prow gardener-prow bot added area/ops-productivity Operator productivity related (how to improve operations) area/dev-productivity Developer productivity related (how to improve development) kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 21, 2023
@acumino
Copy link
Member

acumino commented Nov 24, 2023

/assign

Copy link
Member

@acumino acumino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Nov 24, 2023
Copy link
Contributor

gardener-prow bot commented Nov 24, 2023

LGTM label has been added.

Git tree hash: 2f7a7b4cc8eb2befb2f56020ea04ceddff358662

Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR. The explanations in the commit messages help a lot 🚀
I found one tiny thing only.

Previously (with `cloud-config-downloader`), this was done as part of the `executor.Script` function (see https://github.com/gardener/gardener/blob/67a049a3f66ce489002a67dd59ac7b95e8d2573b/pkg/operation/botanist/operatingsystemconfig.go#L181-L189).
Now (with `gardener-node-agent`), the `hyperkube` image is added as a file with type `imageRef` to the OSC's `.spec.files`, see https://github.com/gardener/gardener/blob/67a049a3f66ce489002a67dd59ac7b95e8d2573b/pkg/component/extensions/operatingsystemconfig/original/components/kubelet/component.go#L192-L218. Hence, the original components context must have the correct `hyperkube` image for the Kubernetes version of the worker pool.
They will be deployed as part of a `ManagedResource` in a subsequent commit.
- For backwards-compatibility, we use the same annotation keys like before (`checksum/cloud-config-data` and `checksum/data-script`), even if these names are not fully accurate. Maybe we can change them later
- `gardenlet` will compute the checksum of the OSC secret and adds the result as annotation
- `gardener-node-agent` reads this checksum from the OSC secret and adds it as annotation to the `Node` after successful reconciliation
- This allows `gardenlet` to check whether GNA applied the most recent OSC on the nodes (health checks, adapted in a subsequent commit)
These `Secret`s get reconciled later by `gardener-node-agent`'s `OperatingSystemConfig` controller.
They will be deployed as part of a `ManagedResource` in a subsequent commit.
Earlier, this function deployed a `ManagedResource` containing the RBAC rules of `cloud-config-downloader` as well as the secrets containing the bash scripts that get executed by `cloud-config-downloader`.

In the next commit, we introduce a functon that deploys a `ManagedResource` containing the RBAC rules for `gardener-node-agent` as well as the secrets containing the OSC that get reconciled by `gardener-node-agent`
- For backwards-compatibility, we use the same annotation keys like before (`checksum/cloud-config-data` and `checksum/data-script`), even if these names are not fully accurate. Maybe we can change them later
- Only when a secret with label
  `gardener.cloud/role=operating-system-config` is found in the shoot,
those secrets are considered for the health checks. This is to make the
checks compatible with both (a) shoots that weren't reconciled yet, i.e.
don't have gardener-node-agent yet, and (b) shoots that were reconciled
and have gardener-node-agent
- `gardener-node-agent` uses `cloud-config-downloader` token to download its own access token when deployed on an existing node
- It deletes the directory and systemd files on the node after start-up
- After the OSC on all nodes was updated, `gardenlet` deletes the `cloud-config-downloader` access secret from both seed and shoot, and the no longer needed `ManagedResource` and `Secret`s for the cloud config execution bash scripts
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2023
@gardener-prow gardener-prow bot requested a review from acumino November 27, 2023 08:57
@oliver-goetz
Copy link
Member

/approve
/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2023
Copy link
Contributor

gardener-prow bot commented Nov 27, 2023

LGTM label has been added.

Git tree hash: 3933c3207ce45c7cf57d7d61e583942ea634bd14

Copy link
Contributor

gardener-prow bot commented Nov 27, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oliver-goetz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 27, 2023
@gardener-prow gardener-prow bot merged commit 3d3887c into gardener:master Nov 27, 2023
@rfranzke rfranzke deleted the gna/gardenlet branch November 27, 2023 12:57
rfranzke added a commit to rfranzke/gardener that referenced this pull request May 31, 2024
rfranzke added a commit to rfranzke/gardener that referenced this pull request May 31, 2024
gardener-prow bot pushed a commit that referenced this pull request Jun 4, 2024
* Remove deprecated fields from `OperatingSystemConfig`

(from #9477, released with `v1.92.0`)

* Remove cleanup of old `kube-apiserver` `Ingress` resource

(from #9300, released with `v1.91.0`)

* Remove Istio zone migration code

(from #9304 and #9457, released with `v1.91.0` and `v1.92.0`)

* Increase removal period of `<name>.ca-cluster` `Secret`

To give users more time to adapt

* Remove PVC migration for `garden` Prometheus

(from #9543, released with `v1.93.0`)

* Remove PVC migration for `longterm` Prometheus

(from #9606, released with `v1.94.0`)

* Drop migration code in `skaffold.yaml` for `core.gardener.cloud/v1` API

(from #9771, released with `v1.96.0`)

* Remove migration code for e2e upgrade tests after `provider-local` VPN fix

(from #9752, released with `v1.96.0`)

* Remove cleanup of old `vali` `VerticalPodAutoscaler`s

(from #9681, released with `v1.94.0`)

* Remove cleanuop code after making `Secret`s of `ManagedResource`s immutable

(from #8116, released with `v1.77.0`)

* Remove cleanup code of resources of legacy `cloud-config-downloader`

(from #8847, released with `v1.85.0`)

* Revert "Remove Istio zone migration code"

This reverts commit 8850346.

* Increase removal period of Istio zone migration code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/dev-productivity Developer productivity related (how to improve development) area/ops-productivity Operator productivity related (how to improve operations) cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants