Skip to content

Conversation

rfranzke
Copy link
Member

@rfranzke rfranzke commented Mar 19, 2025

How to categorize this PR?

/area ipcei
/kind enhancement

What this PR does / why we need it:
This PR is the next increment for gardenadm init. It brings up the control plane components as static pods. Most notable changes:

  • kubeapiserver component can again provision a static token kubeconfig, yet this is only enabled for the bootstrap phase of gardenadm init and not for regular shoots or the virtual garden
  • AutonomousBotanist struct introduced in gardenadm package

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:
Still in draft due to missing unit tests, cleanup, and open TODOs.

/cc @ScheererJ

Release note:

NONE

Copy link
Contributor

gardener-prow bot commented Mar 19, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@gardener-prow gardener-prow bot requested a review from ScheererJ March 19, 2025 15:05
@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/ipcei IPCEI (Important Project of Common European Interest) kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 19, 2025
@rfranzke rfranzke force-pushed the gep28/init-controlplane branch from 70165f4 to 2247f37 Compare March 19, 2025 16:20
@ScheererJ
Copy link
Member

/assign

@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2025
Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well-structured, awesome change bringing Gardener closer to support for autonomous shoot clusters.

It already looks fairly good so that I found only a few minor nits.

@rfranzke rfranzke force-pushed the gep28/init-controlplane branch from 2247f37 to b9f9ef8 Compare March 21, 2025 15:25
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 21, 2025
@rfranzke rfranzke requested a review from ScheererJ March 21, 2025 15:25
@rfranzke rfranzke marked this pull request as ready for review March 21, 2025 15:25
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2025
This makes Skaffold consider a digest of the input (source code) when
constructing a tag for the built `gardenadm` image. This way, a new
image is built and extracted even when there was no new git commit but
only changes in the code.
This makes `make gardenadm-high-touch-up` idempotent. Without this,
multiple invocations result in:

```
root@machine-0:/# ls -al /gardenadm/resources/
total 12
drwxr-xr-x 2 root root  4096 Mar 17 14:04 .
drwxrwxrwx 3 root root  4096 Mar 17 14:04 ..
-rw-r--r-- 1  501 staff 2584 Mar 17 14:04 manifests.yaml
```

now run `make gardenadm-high-touch-up` again
result:

```
root@machine-0:/# ls -al /gardenadm/resources/
total 16
drwxr-xr-x 3 root root  4096 Mar 17 14:12 .
drwxrwxrwx 3 root root  4096 Mar 17 14:04 ..
-rw-r--r-- 1  501 staff 2584 Mar 17 14:04 manifests.yaml
drwxr-xr-x 2 root root  4096 Mar 17 14:12 resources
```
@rfranzke rfranzke force-pushed the gep28/init-controlplane branch from b9f9ef8 to 6302d26 Compare March 24, 2025 09:10
We don't want to print usage information on errors, so let's silence
this.
We were already doing this for `garden.local.gardener.cloud:5001`, but
now we also need the other mirrors. Hence, let's just add all of them to
the `Dockerfile` directly (this way, we can drop the webhook in
`provider-local` which mutates the `OperatingSystemConfig` and
configures the mirrors there).
- we will deploy two etcds (main and events) even for bootstrapping to
  remain as close to the target picture as possible
- on the way, we fixed a few minor issues that prevented etcd from
  starting up
This partially reverts gardener@41c09fe.

This will only be used for bootstrapping the autonomous shoot cluster
control plane (i.e., only short-term, and only for bringing up the
components initially).
- plus helper func for fetching the control plane worker pool
- specify network settings (now that we have a control-plane worker pool, `botanist.DefaultNetwork` is called).
  Without this, this will fail with:

  ```
  panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x19abfb4]

  goroutine 1 [running]:
  github.com/gardener/gardener/pkg/gardenlet/operation/botanist.(*Botanist).DefaultNetwork(0x400080aef8)
  	github.com/gardener/gardener/pkg/gardenlet/operation/botanist/network.go:27 +0x304
  ...
  ```

- set `failSwapOn=false` for local shoot (for regular shoots, provider-local usually injects the `failSwapOn`, but it is not running here)
- also use other kubelet configs from https://github.com/gardener/gardener/blob/be88429ebd3288c3a17f5e9351d883e8a6ed2650/example/provider-local/shoot.yaml#L29-L36

- in addition, make sure shoot networks are set

  This is usually called separately in the reconciliation flows: https://github.com/gardener/gardener/blob/e5b022d637bbf52dcb427e52fbce45e1eb18e220/pkg/gardenlet/controller/shoot/shoot/reconciler_reconcile.go#L127-L131
- embeds shoot's `Botanist` struct
- will carry methods for bootstrapping an autonomous shoot
Earlier, this was only populated in `Wait`, but we can move the static
data up to `Deploy`/`Restore`. In the `gardenadm init` flow, there is no OSC
controller, i.e., we will not call `Wait` anyways.
In high-touch, the machines are already set-up/Gardener does not control
the operating system. Hence, we specifying it in the `ShootSpec` should
not be mandatory.
rfranzke added 10 commits March 24, 2025 18:09
- OSC carries additional files for static control plane pods
- without this, the containers cannot read the files from the host
  (mounted via `hostPath`)
Without this:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x197e774]

goroutine 13 [running]:
github.com/gardener/gardener/pkg/gardenlet/operation/shoot.(*Shoot).ComputeOutOfClusterAPIServerAddress(0x0?, 0x0?)
	github.com/gardener/gardener/pkg/gardenlet/operation/shoot/shoot.go:511 +0x194
github.com/gardener/gardener/pkg/gardenlet/operation/botanist.(*Botanist).DeployKubeAPIServer(0x40005dac90, {0x2932cf8, 0x400090d090}, 0x0)
	github.com/gardener/gardener/pkg/gardenlet/operation/botanist/kubeapiserver.go:165 +0x40
...
For static pods, we inject this as host alias to resolve to localhost
(they cannot directly resolve `kubernetes.default.svc` because they don't
talk to the cluster DNS (CoreDNS))
For regular shoots, provider-local's webhook on OperatingSystemConfig
injects the `/ko-app` prefix. However, in the gardenadm scenario, we
don't have provider-local's webhook, so the only feasible option is to
handle it ourselves. Not nice, but perhaps the most pragmatic solution.

We might have to do the same for the gardener-node-init script later
when we implement `gardenadm` join (then we can drop the webhook in
provider-local for good).
Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes.

/lgtm
/approve

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 26, 2025
Copy link
Contributor

gardener-prow bot commented Mar 26, 2025

LGTM label has been added.

Git tree hash: ce415d850203720a8fe3351341f9869d23767e15

Copy link
Contributor

gardener-prow bot commented Mar 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ScheererJ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2025
@gardener-prow gardener-prow bot merged commit ee8b992 into gardener:master Mar 26, 2025
19 checks passed
@rfranzke rfranzke deleted the gep28/init-controlplane branch March 27, 2025 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipcei IPCEI (Important Project of Common European Interest) cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants