Skip to content

Conversation

rfranzke
Copy link
Member

@rfranzke rfranzke commented May 14, 2025

How to categorize this PR?

/area ipcei
/kind flake

What this PR does / why we need it:

  1. Do not label machine StatefulSet with app=machine
    After [GEP-28] gardenadm bootstrap: Deploy gardener-resource-manager and provider extension #12052, provider-local is deployed into the kind cluster. It runs the dnsconfig webhook which reacts on app=machine pods:

    {Key: "app", Operator: metav1.LabelSelectorOpIn, Values: []string{
    "machine",

    This provider-local is responsible for the medium-touch scenario of gardenadm.

    Hence, we actually don't want it to react on the pods related to the machine StatefulSet since these are relevant for the high-touch scenario of gardenadm. Let's simply change the labels here to prevent the webhook from interfering.

    Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664
    Here, the kube-controller-manager of the kind cluster fails to rollout
    the new machine pods.

    From https://gcsweb.prow.gardener.cloud/gcs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664/artifacts/gardener-local/gardener-local-control-plane/pods/kube-system_kube-controller-manager-gardener-local-control-plane_e4d0c8f9c71fba7c8d1b9da5e2da23b7/kube-controller-manager/0.log

    2025-05-13T14:26:54.433492015Z stderr F E0513 14:26:54.433359       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
    2025-05-13T14:26:54.440101864Z stderr F E0513 14:26:54.439984       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
    2025-05-13T14:26:54.444946334Z stderr F E0513 14:26:54.444823       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
    ...
    
  2. Increase timeout for building gardenadm binary
    Since the complexity of the binary has increased in the past weeks, and since running many e2e in the Prow cluster can lead to CPU shortage, let's better increase the timeout a bit to prevent running into flakes.

    Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922537540246573056

  3. Remove gardener.cloud/role=shoot label from gardenadm-high-touch namespace
    To prevent NetworkPolicies to be applied

    Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12085/pull-gardener-e2e-kind-gardenadm/1922568377612636160

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:
/cc @timebertt

Release note:

NONE

@gardener-prow gardener-prow bot requested a review from timebertt May 14, 2025 08:23
@gardener-prow gardener-prow bot added area/ipcei IPCEI (Important Project of Common European Interest) kind/flake Tracking or fixing a flaky test cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 14, 2025
@rfranzke
Copy link
Member Author

/cherry-pick release-v1.118

@gardener-ci-robot
Copy link
Contributor

@rfranzke: once the present PR merges, I will cherry-pick it on top of release-v1.118 in a new PR and assign it to you.

In response to this:

/cherry-pick release-v1.118

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2025
Copy link
Contributor

gardener-prow bot commented May 14, 2025

LGTM label has been added.

Git tree hash: bebf0ce53cb974e636c2489bf930c9baaf9ba4bb

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2025
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 14, 2025
@gardener-prow gardener-prow bot requested a review from timebertt May 14, 2025 09:40
@gardener-prow gardener-prow bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 14, 2025
rfranzke and others added 3 commits May 14, 2025 12:29
After gardener#12052, `provider-local`
is deployed into the kind cluster. It runs the `dnsconfig` webhook which
reacts on `app=machine` pods: https://github.com/gardener/gardener/blob/f3efe83358ef0a6182d63c759b290bd392d7260f/pkg/provider-local/webhook/dnsconfig/add.go#L63-L64
This `provider-local` is responsible for the medium-touch scenario of
`gardenadm`.

Hence, we actually don't want it to react on the pods related to the
`machine` `StatefulSet` since these are relevant for the high-touch
scenario of `gardenadm`. Let's simply change the labels here to prevent
the webhook from interfering.

Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664
Here, the `kube-controller-manager` of the kind cluster fails to rollout
the new `machine` pods.

From https://gcsweb.prow.gardener.cloud/gcs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664/artifacts/gardener-local/gardener-local-control-plane/pods/kube-system_kube-controller-manager-gardener-local-control-plane_e4d0c8f9c71fba7c8d1b9da5e2da23b7/kube-controller-manager/0.log

```
2025-05-13T14:26:54.433492015Z stderr F E0513 14:26:54.433359       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:54.440101864Z stderr F E0513 14:26:54.439984       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:54.444946334Z stderr F E0513 14:26:54.444823       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:54.455228214Z stderr F E0513 14:26:54.455144       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:54.501962554Z stderr F E0513 14:26:54.501844       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:54.587370813Z stderr F E0513 14:26:54.587262       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:54.753085172Z stderr F E0513 14:26:54.752971       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:55.079471078Z stderr F E0513 14:26:55.079345       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:55.726386972Z stderr F E0513 14:26:55.726251       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
2025-05-13T14:26:57.01175209Z stderr F E0513 14:26:57.011625       1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError"
```
Since the complexity of the binary has increased in the past weeks, and
since running many e2e in the Prow cluster can lead to CPU shortage,
let's better increase the timeout a bit to prevent running into flakes.

Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922537540246573056
…namespace

to prevent NetworkPolicies to be applied
@rfranzke rfranzke force-pushed the gep28/e2e-timeouts branch from b3a090d to c6a01a4 Compare May 14, 2025 10:29
@ScheererJ
Copy link
Member

/assign

Could you please adapt the pull request description concerning the third case, i.e. prevent addition of network policies?

Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2025
Copy link
Contributor

gardener-prow bot commented May 14, 2025

LGTM label has been added.

Git tree hash: d2351ed1db74457d63ea03bb73796bad0db3da54

@LucaBernstein
Copy link
Member

Thank you! :)

Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Copy link
Contributor

gardener-prow bot commented May 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ScheererJ, timebertt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ScheererJ,timebertt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot merged commit a635858 into gardener:master May 14, 2025
19 checks passed
@gardener-ci-robot
Copy link
Contributor

@rfranzke: new pull request created: #12090

In response to this:

/cherry-pick release-v1.118

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@timebertt
Copy link
Member

Nice, thanks folks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipcei IPCEI (Important Project of Common European Interest) cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/flake Tracking or fixing a flaky test lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants