Skip to content

Conversation

shafeeqes
Copy link
Contributor

@shafeeqes shafeeqes commented Feb 14, 2025

Co-Authored-By: Ashish Ranjan Yadav ashish.ranjan.yadav@sap.com
Co-Authored-By: Sonu Kumar Singh sonu.kumar.singh02@sap.com

How to categorize this PR?

/kind enhancement

What this PR does / why we need it:
This PR prepares the OSC extension deployment and the OSC reconciler to populate fields necessary for in-place update in the OSC spec and status. Once released, we can vendor the OS extensions with this version.
These fields are only populated if the update strategy is in-place.

Along the way, we felt that it'd be better to populate the OSC with the relevant fields of kubelet config instead of just a hash.
This is required to execute some special steps in case of some kubelet config changes, eg: https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#changing-the-cpu-manager-policy

And since the InPlaceUpdates is not present for Rolling update, let's make it a separate struct in the OSC with in-place related config.

Which issue(s) this PR fixes:
Part of #10219

Special notes for your reviewer:
/cc @acumino @ary1992

Release note:

NONE

@gardener-prow gardener-prow bot added the kind/enhancement Enhancement, improvement, extension label Feb 14, 2025
@gardener-prow gardener-prow bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 14, 2025
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the way, we felt that it'd be better to populate the OSC with the relevant fields of kubelet config instead of just a hash.
This is required to execute some special steps in case of some kubelet config changes, eg: https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#changing-the-cpu-manager-policy

Please elaborate. Which kind of special steps do you talk about/would be relevant to the OSC controllers/node-agent?

@shafeeqes
Copy link
Contributor Author

Along the way, we felt that it'd be better to populate the OSC with the relevant fields of kubelet config instead of just a hash.
This is required to execute some special steps in case of some kubelet config changes, eg: https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#changing-the-cpu-manager-policy

Please elaborate. Which kind of special steps do you talk about/would be relevant to the OSC controllers/node-agent?

In case of CPU Manager policy change, just draining and restarting the kubelet isn't enough, we need to remove the cpu_manager_state file as well.

@gardener-prow gardener-prow bot added the cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. label Feb 14, 2025
@shafeeqes shafeeqes requested a review from rfranzke February 14, 2025 12:10
@rfranzke
Copy link
Member

Can't we parse the kubelet config file from the OSC instead of duplicating some values in the OSC spec?

@shafeeqes
Copy link
Contributor Author

Can't we parse the kubelet config file from the OSC instead of duplicating some values in the OSC spec?

We did consider this, but there are some challenges with this, the config file in the node is applied after some defaulting here,

func setConfigDefaults(c *components.ConfigurableKubeletConfigParameters, kubernetesVersion *semver.Version) {
if c.CpuCFSQuota == nil {
c.CpuCFSQuota = ptr.To(true)
}
if c.CpuManagerPolicy == nil {
c.CpuManagerPolicy = ptr.To(kubeletconfigv1beta1.NoneTopologyManagerPolicy)
}
if c.EvictionHard == nil {
c.EvictionHard = make(map[string]string, 5)
}
for k, v := range evictionHardDefaults {
if c.EvictionHard[k] == "" {
c.EvictionHard[k] = v
}
}
if c.EvictionSoft == nil {
c.EvictionSoft = make(map[string]string, 5)
}
for k, v := range evictionSoftDefaults {
if c.EvictionSoft[k] == "" {
c.EvictionSoft[k] = v
}
}
if c.EvictionSoftGracePeriod == nil {
c.EvictionSoftGracePeriod = make(map[string]string, 5)
}
for k, v := range evictionSoftGracePeriodDefaults {
if c.EvictionSoftGracePeriod[k] == "" {
c.EvictionSoftGracePeriod[k] = v
}
}
if c.EvictionMinimumReclaim == nil {
c.EvictionMinimumReclaim = make(map[string]string, 5)
}
for k, v := range evictionMinimumReclaimDefaults {
if c.EvictionMinimumReclaim[k] == "" {
c.EvictionMinimumReclaim[k] = v
}
}
if c.EvictionPressureTransitionPeriod == nil {
c.EvictionPressureTransitionPeriod = &metav1.Duration{Duration: 4 * time.Minute}
}
if c.EvictionMaxPodGracePeriod == nil {
c.EvictionMaxPodGracePeriod = ptr.To[int32](90)
}
if c.FailSwapOn == nil {
c.FailSwapOn = ptr.To(true)
}
if c.ImageGCHighThresholdPercent == nil {
c.ImageGCHighThresholdPercent = ptr.To[int32](50)
}
if c.ImageGCLowThresholdPercent == nil {
c.ImageGCLowThresholdPercent = ptr.To[int32](40)
}
if c.SerializeImagePulls == nil {
c.SerializeImagePulls = ptr.To(true)
}
if c.KubeReserved == nil {
c.KubeReserved = make(map[string]string, 2)
}
for k, v := range kubeReservedDefaults {
if c.KubeReserved[k] == "" {
c.KubeReserved[k] = v
}
}
if c.MaxPods == nil {
c.MaxPods = ptr.To[int32](110)
}
if c.ContainerLogMaxSize == nil {
c.ContainerLogMaxSize = ptr.To("100Mi")
}
c.ProtectKernelDefaults = ptr.To(ShouldProtectKernelDefaultsBeEnabled(c, kubernetesVersion))
if c.StreamingConnectionIdleTimeout == nil {
if version.ConstraintK8sGreaterEqual126.Check(kubernetesVersion) {
c.StreamingConnectionIdleTimeout = &metav1.Duration{Duration: time.Minute * 5}
} else {
// this is also the kubernetes default
c.StreamingConnectionIdleTimeout = &metav1.Duration{Duration: time.Hour * 4}
}
}
}

So in case if the Shoot spec is some but the default is changed here for some reason (ideally, we wouldn't), node-agent would detect it as an in-place update and wait for the Node to be ready.
I agree this would also happen today, if some defaults are changed here, that would not roll the nodes since the hash doesn't change.

Should we add a comment here about this and go on with reading the config file approach?

@rfranzke
Copy link
Member

I see. Given that this also happens with rolling update strategy as of today, perhaps it's better to address this concern (if it is one) separately out of the scope of GEP-31?

@shafeeqes
Copy link
Contributor Author

Today, with rolling, even if the defaulting changes, the config file is applied, and kubelet is restarted (It can have other side affects since there is no drain), but the action is not blocking as such.
Probably we should move the defaulting to Shoot API level rather than at the kubelet config file.
I will open a separate issue for this.

@shafeeqes
Copy link
Contributor Author

Opened #11416

@shafeeqes
Copy link
Contributor Author

/retest

@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2025
@shafeeqes shafeeqes force-pushed the inplace-updates-osc-api branch from a33445a to 0a1eb18 Compare February 18, 2025 16:21
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2025
@shafeeqes
Copy link
Contributor Author

/retest

shafeeqes and others added 9 commits February 20, 2025 18:02
Co-Authored-By: Ashish Ranjan Yadav <ashish.ranjan.yadav@sap.com>
Co-Authored-By: Sonu Kumar Singh <sonu.kumar.singh02@sap.com>
Co-Authored-By: Ashish Ranjan Yadav <ashish.ranjan.yadav@sap.com>
Co-Authored-By: Sonu Kumar Singh <sonu.kumar.singh02@sap.com>
Co-Authored-By: Ashish Ranjan Yadav <ashish.ranjan.yadav@sap.com>
Co-Authored-By: Sonu Kumar Singh <sonu.kumar.singh02@sap.com>
Co-Authored-By: Ashish Ranjan Yadav <ashish.ranjan.yadav@sap.com>
Co-Authored-By: Sonu Kumar Singh <sonu.kumar.singh02@sap.com>
Co-Authored-By: Ashish Ranjan Yadav <ashish.ranjan.yadav@sap.com>
Co-Authored-By: Sonu Kumar Singh <sonu.kumar.singh02@sap.com>
NB: Hide whitespaces for this commit

Co-Authored-By: Ashish Ranjan Yadav <ashish.ranjan.yadav@sap.com>
Co-Authored-By: Sonu Kumar Singh <sonu.kumar.singh02@sap.com>
@shafeeqes shafeeqes force-pushed the inplace-updates-osc-api branch from 0a1eb18 to 52d7e1d Compare February 20, 2025 12:36
@shafeeqes shafeeqes requested a review from rfranzke February 21, 2025 12:57
@shafeeqes shafeeqes force-pushed the inplace-updates-osc-api branch from 8247d42 to 4f3e783 Compare February 21, 2025 13:00
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 21, 2025
Copy link
Contributor

gardener-prow bot commented Feb 21, 2025

LGTM label has been added.

Git tree hash: 28da5b9f0c802eae78fc3835621231837fa177de

Copy link
Contributor

gardener-prow bot commented Feb 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rfranzke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. and removed cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. cla: no Indicates the PR's author has not signed the cla-assistant.io CLA. labels Feb 21, 2025
@plkokanov
Copy link
Contributor

/cherry-pick release-v1.113

@gardener-ci-robot
Copy link
Contributor

@plkokanov: once the present PR merges, I will cherry-pick it on top of release-v1.113 in a new PR and assign it to you.

In response to this:

/cherry-pick release-v1.113

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@plkokanov
Copy link
Contributor

/test pull-gardener-e2e-kind-ha-multi-zone

@gardener-prow gardener-prow bot merged commit 0261410 into gardener:master Feb 21, 2025
18 of 19 checks passed
@gardener-ci-robot
Copy link
Contributor

@plkokanov: new pull request created: #11488

In response to this:

/cherry-pick release-v1.113

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants