Skip to content

Conversation

rfranzke
Copy link
Member

How to categorize this PR?

/area ipcei
/kind bug

What this PR does / why we need it:
Currently, provider-local goes into a CrashLoopBackoff after some time in ASCs. Note the restarts:

root@machine-0:/# k get po -A | grep provider-local
extension-provider-local      gardener-extension-provider-local-56d999bb56-l4kv5     1/1     Running            17 (5m30s ago)   110m
extension-provider-local      gardener-extension-provider-local-56d999bb56-sxfcr     0/1     CrashLoopBackOff   16 (3m6s ago)    110m

Logs:

{"level":"error","ts":"2025-04-22T11:11:43.069Z","msg":"Error executing the main controller command","error":"error running manager: failed to wait for worker caches to sync kind source: *v1alpha1.Machine: timed out waiting for cache to be synced for Kind *v1alpha1.Machine","stacktrace":"main.main\n\tgithub.com/gardener/gardener/cmd/gardener-extension-provider-local/main.go:24\nruntime.main\n\truntime/proc.go:283"}

This is because the Machine API is not present (at least for the high-touch scenario).

With this PR, we can check if the Machine API is present in case we are reconciling an ASC. If yes, we are in the medium-touch case, i.e., we can watch Machines. If no, we are in the high-touch case, i.e., we cannot watch them. This works because CRDs are always applied before extensions are deployed in the ASC scenario.

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:
/cc @timebertt @maboehm

Release note:

NONE

…autonomous shoots

Note the restarts:

```
root@machine-0:/# k get po -A | grep provider-local
extension-provider-local      gardener-extension-provider-local-56d999bb56-l4kv5     1/1     Running            17 (5m30s ago)   110m
extension-provider-local      gardener-extension-provider-local-56d999bb56-sxfcr     0/1     CrashLoopBackOff   16 (3m6s ago)    110m
```

Logs:

```
{"level":"error","ts":"2025-04-22T11:11:43.069Z","msg":"Error executing the main controller command","error":"error running manager: failed to wait for worker caches to sync kind source: *v1alpha1.Machine: timed out waiting for cache to be synced for Kind *v1alpha1.Machine","stacktrace":"main.main\n\tgithub.com/gardener/gardener/cmd/gardener-extension-provider-local/main.go:24\nruntime.main\n\truntime/proc.go:283"}
```

For ASCs, we can check if the API is present. If yes, we are in the
medium-touch case, i.e., we can watch `Machine`s. If no, we are in the
high-touch case, i.e., we cannot watch them. This works because CRDs are
always applied before extensions are deployed in the ASC scenario.
@gardener-prow gardener-prow bot requested a review from timebertt April 29, 2025 13:07
@gardener-prow gardener-prow bot added the area/ipcei IPCEI (Important Project of Common European Interest) label Apr 29, 2025
Copy link
Contributor

gardener-prow bot commented Apr 29, 2025

@rfranzke: GitHub didn't allow me to request PR reviews from the following users: maboehm.

Note that only gardener members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

How to categorize this PR?

/area ipcei
/kind bug

What this PR does / why we need it:
Currently, provider-local goes into a CrashLoopBackoff after some time in ASCs. Note the restarts:

root@machine-0:/# k get po -A | grep provider-local
extension-provider-local      gardener-extension-provider-local-56d999bb56-l4kv5     1/1     Running            17 (5m30s ago)   110m
extension-provider-local      gardener-extension-provider-local-56d999bb56-sxfcr     0/1     CrashLoopBackOff   16 (3m6s ago)    110m

Logs:

{"level":"error","ts":"2025-04-22T11:11:43.069Z","msg":"Error executing the main controller command","error":"error running manager: failed to wait for worker caches to sync kind source: *v1alpha1.Machine: timed out waiting for cache to be synced for Kind *v1alpha1.Machine","stacktrace":"main.main\n\tgithub.com/gardener/gardener/cmd/gardener-extension-provider-local/main.go:24\nruntime.main\n\truntime/proc.go:283"}

This is because the Machine API is not present (at least for the high-touch scenario).

With this PR, we can check if the Machine API is present in case we are reconciling an ASC. If yes, we are in the medium-touch case, i.e., we can watch Machines. If no, we are in the high-touch case, i.e., we cannot watch them. This works because CRDs are always applied before extensions are deployed in the ASC scenario.

Which issue(s) this PR fixes:
Part of #2906

Special notes for your reviewer:
/cc @timebertt @maboehm

Release note:

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gardener-prow gardener-prow bot added kind/bug Bug cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 29, 2025
Copy link
Member

@timebertt timebertt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2025
Copy link
Contributor

gardener-prow bot commented Apr 29, 2025

LGTM label has been added.

Git tree hash: 9d806d2145798cd63b0822b0983c71ef5a4e0ae4

@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2025
@gardener-prow gardener-prow bot requested a review from timebertt April 29, 2025 14:51
@rfranzke rfranzke force-pushed the worker-extensions-machine-watch branch from adec676 to eebc61e Compare April 29, 2025 14:57
@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2025
Copy link
Contributor

gardener-prow bot commented Apr 29, 2025

LGTM label has been added.

Git tree hash: 15885508fe8735f7090246f011005663c5f07091

Copy link
Contributor

gardener-prow bot commented Apr 29, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timebertt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2025
@gardener-prow gardener-prow bot merged commit 79371ad into gardener:master Apr 30, 2025
19 checks passed
@rfranzke rfranzke deleted the worker-extensions-machine-watch branch April 30, 2025 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipcei IPCEI (Important Project of Common European Interest) cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/bug Bug lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants