Skip to content

Conversation

rfranzke
Copy link
Member

How to categorize this PR?

/area documentation
/kind enhancement

What this PR does / why we need it:
This PR proposes GEP-28 for augmenting Gardener's functionality to be able to provision and manage autonomous shoot clusters (i.e., those having dedicated nodes for running the control plane instead of running it in a separate seed cluster).

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:
cc @ScheererJ @vlerenc

Release note:

NONE

Co-Authored-By: Johannes Scheerer <johannes.scheerer@sap.com>
Co-Authored-By: Vedran Lerenc <vlerenc@gmail.com>
@gardener-prow gardener-prow bot added area/documentation Documentation related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Sep 17, 2024
@gardener-prow gardener-prow bot requested review from acumino and ary1992 September 17, 2024 08:45
@gardener-prow gardener-prow bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 17, 2024
@plkokanov
Copy link
Contributor

/assign

@acumino
Copy link
Member

acumino commented Sep 17, 2024

/assign

@oliver-goetz
Copy link
Member

/assign

@Kostov6
Copy link
Contributor

Kostov6 commented Sep 19, 2024

/assign

Copy link
Member

@acumino acumino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the GEP and the clear explanation. Loved reading this!

Copy link
Member

@vpnachev vpnachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed proposal!

I have some in-line comments and some general questions

  1. Will the nodes of an autonomous cluster be managed by MCM at some point and how the manually created one will be handled (ignored/deleted)?
  2. While k8s minor version update is expected to be in-place, is it planned to support OS version update?
  3. How will be agents (gardenadm, garden-node-agent, gardener-resource-manager, etc.) on the node updated?

Copy link
Contributor

@Kostov6 Kostov6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the GEP! A couple of questions and a suggestion from my side

@ScheererJ
Copy link
Member

Thanks for the detailed proposal!

I have some in-line comments and some general questions

  1. Will the nodes of an autonomous cluster be managed by MCM at some point and how the manually created one will be handled (ignored/deleted)?

This depends heavily on the scenario (see https://github.com/gardener/gardener/blob/68ccf134a903568bb724f0926d5c3dfa2e470379/docs/proposals/28-autonomous-shoot-clusters.md#scenarios).

  1. While k8s minor version update is expected to be in-place, is it planned to support OS version update?

It is assumed that this will be tackled as part of #10219 (see https://github.com/gardener/gardener/blob/68ccf134a903568bb724f0926d5c3dfa2e470379/docs/proposals/28-autonomous-shoot-clusters.md#high-touch), but yes, the plan is to be able to do as much as possible also in-place, potentially with a restart of the node if required.

  1. How will be agents (gardenadm, garden-node-agent, gardener-resource-manager, etc.) on the node updated?

gardenadm is currently not envisioned to be run as an agent, but rather as a command line program. The user brings it to the node via any viable option ranging from download to a USB stick. After the initial setup, gardenadm should no longer be necessary. Day-2 operations are assumed to be handled by a gardenlet variant in conjunction with a garden.

That being said, the update of the gardener-node-agent as well as the static pods of the control plane is handled via existing means, i.e. OperatingSystemConfig and gardener-node-agent. The gardenlet eventually managing the autonomous cluster will create an OperatingSystemConfig including changes to the gardener-node-agent. In addition to that, the in-cluster dummy resources for the static pods (see https://github.com/gardener/gardener/blob/68ccf134a903568bb724f0926d5c3dfa2e470379/docs/proposals/28-autonomous-shoot-clusters.md#dealing-with-webhooks-for-control-plane-components) will be updated accordingly triggering changed files to be added to the OperatingSystemConfig, which in turn will then be rolled out via gardener-node-agent.

Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice GEP and very interesting concepts to align the current architecture with the autonomous goals 👏

@gardener-prow gardener-prow bot requested a review from ScheererJ October 7, 2024 11:51
@rfranzke
Copy link
Member Author

rfranzke commented Oct 8, 2024

It is assumed that this will be tackled as part of #10219 (see https://github.com/gardener/gardener/blob/68ccf134a903568bb724f0926d5c3dfa2e470379/docs/proposals/28-autonomous-shoot-clusters.md#high-touch), but yes, the plan is to be able to do as much as possible also in-place, potentially with a restart of the node if required.

Actually, we would still go (at least for now) with regular rolling updates of the worker or control plane nodes in case they are managed by machine-controller-manager. In-place updates are only needed on bare metal machines, i.e., in the high-touch scenario.

Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@timebertt timebertt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice GEP, great work! I love that the reasoning behind all the ideas is explained and easy to understand. Overall, the proposal has a great level of details.

I'm sold on the general approach and I envision that it will fit my team's strategy and help us simplify the management of our initial clusters.

My comments mainly focus on sharpening some of the proposal's details.

Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, GEP thanks 🚀
I have only few remarks.

Copy link
Member

@timebertt timebertt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments.
From my thoughts, there are only open comments left that need to be considered in the implementation. They could be added to #2906 already, but we don't need to add those details to the GEP itself.

Hence, I'd say this PR
/lgtm
😄

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Oct 16, 2024
Copy link
Contributor

gardener-prow bot commented Oct 16, 2024

LGTM label has been added.

Git tree hash: ae0737f0972720936ba58244f5d77929d9b5393f

@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 16, 2024
@gardener-prow gardener-prow bot requested a review from timebertt October 16, 2024 15:14
Copy link
Member

@timebertt timebertt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2024
Copy link
Contributor

gardener-prow bot commented Oct 17, 2024

LGTM label has been added.

Git tree hash: 5c243010d5a73d95e47ad02a7f77743025e2abf9

@plkokanov
Copy link
Contributor

Thanks for answering my questions. Looking forward to seeing the implementation :)
/lgtm

Copy link
Contributor

@Kostov6 Kostov6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@oliver-goetz
Copy link
Member

/lgtm

@ScheererJ
Copy link
Member

/approve

Copy link
Contributor

gardener-prow bot commented Oct 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ScheererJ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2024
@rfranzke
Copy link
Member Author

/retest

@LucaBernstein
Copy link
Member

/test all

@gardener-prow gardener-prow bot merged commit 255cb6f into gardener:master Oct 29, 2024
19 checks passed
@rfranzke rfranzke deleted the gep/autonomous-shoot branch November 6, 2024 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation Documentation related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants