-
Notifications
You must be signed in to change notification settings - Fork 525
[GEP-28] Prevent flaky e2e tests #12085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/cherry-pick release-v1.118 |
@rfranzke: once the present PR merges, I will cherry-pick it on top of release-v1.118 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
LGTM label has been added. Git tree hash: bebf0ce53cb974e636c2489bf930c9baaf9ba4bb
|
66ce234
to
b3a090d
Compare
After gardener#12052, `provider-local` is deployed into the kind cluster. It runs the `dnsconfig` webhook which reacts on `app=machine` pods: https://github.com/gardener/gardener/blob/f3efe83358ef0a6182d63c759b290bd392d7260f/pkg/provider-local/webhook/dnsconfig/add.go#L63-L64 This `provider-local` is responsible for the medium-touch scenario of `gardenadm`. Hence, we actually don't want it to react on the pods related to the `machine` `StatefulSet` since these are relevant for the high-touch scenario of `gardenadm`. Let's simply change the labels here to prevent the webhook from interfering. Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664 Here, the `kube-controller-manager` of the kind cluster fails to rollout the new `machine` pods. From https://gcsweb.prow.gardener.cloud/gcs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664/artifacts/gardener-local/gardener-local-control-plane/pods/kube-system_kube-controller-manager-gardener-local-control-plane_e4d0c8f9c71fba7c8d1b9da5e2da23b7/kube-controller-manager/0.log ``` 2025-05-13T14:26:54.433492015Z stderr F E0513 14:26:54.433359 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:54.440101864Z stderr F E0513 14:26:54.439984 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:54.444946334Z stderr F E0513 14:26:54.444823 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:54.455228214Z stderr F E0513 14:26:54.455144 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:54.501962554Z stderr F E0513 14:26:54.501844 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:54.587370813Z stderr F E0513 14:26:54.587262 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:54.753085172Z stderr F E0513 14:26:54.752971 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:55.079471078Z stderr F E0513 14:26:55.079345 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:55.726386972Z stderr F E0513 14:26:55.726251 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" 2025-05-13T14:26:57.01175209Z stderr F E0513 14:26:57.011625 1 stateful_set.go:438] "Unhandled Error" err="error syncing StatefulSet gardenadm-high-touch/machine, requeuing: admission webhook \"dnsconfig.local.extensions.gardener.cloud\" denied the request: Service \"coredns\" not found" logger="UnhandledError" ```
Since the complexity of the binary has increased in the past weeks, and since running many e2e in the Prow cluster can lead to CPU shortage, let's better increase the timeout a bit to prevent running into flakes. Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922537540246573056
…namespace to prevent NetworkPolicies to be applied
b3a090d
to
c6a01a4
Compare
/assign Could you please adapt the pull request description concerning the third case, i.e. prevent addition of network policies? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: d2351ed1db74457d63ea03bb73796bad0db3da54
|
Thank you! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ScheererJ, timebertt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@rfranzke: new pull request created: #12090 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Nice, thanks folks! |
How to categorize this PR?
/area ipcei
/kind flake
What this PR does / why we need it:
Do not label
machine
StatefulSet
withapp=machine
After [GEP-28]
gardenadm bootstrap
: Deploy gardener-resource-manager and provider extension #12052,provider-local
is deployed into the kind cluster. It runs thednsconfig
webhook which reacts onapp=machine
pods:gardener/pkg/provider-local/webhook/dnsconfig/add.go
Lines 63 to 64 in f3efe83
This
provider-local
is responsible for the medium-touch scenario ofgardenadm
.Hence, we actually don't want it to react on the pods related to the
machine
StatefulSet
since these are relevant for the high-touch scenario ofgardenadm
. Let's simply change the labels here to prevent the webhook from interfering.Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664
Here, the
kube-controller-manager
of the kind cluster fails to rolloutthe new
machine
pods.From https://gcsweb.prow.gardener.cloud/gcs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922295612275953664/artifacts/gardener-local/gardener-local-control-plane/pods/kube-system_kube-controller-manager-gardener-local-control-plane_e4d0c8f9c71fba7c8d1b9da5e2da23b7/kube-controller-manager/0.log
Increase timeout for building
gardenadm
binarySince the complexity of the binary has increased in the past weeks, and since running many e2e in the Prow cluster can lead to CPU shortage, let's better increase the timeout a bit to prevent running into flakes.
Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/logs/ci-gardener-e2e-kind-gardenadm/1922537540246573056
Remove
gardener.cloud/role=shoot
label fromgardenadm-high-touch
namespaceTo prevent NetworkPolicies to be applied
Example flake: https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12085/pull-gardener-e2e-kind-gardenadm/1922568377612636160
Which issue(s) this PR fixes:
Part of #2906
Special notes for your reviewer:
/cc @timebertt
Release note: