-
Notifications
You must be signed in to change notification settings - Fork 526
Description
How to categorize this issue?
/area control-plane
/kind bug
What happened:
Creating a shoot cluster failed because the Worker
could not be reconciled, because the MachineClasses
could not be created in DeployMachineClasses
in the provider extension. This was caused by a missing configuration that should have been caught during Shoot
validation. but that specific validation was missing.
The error that was thrown during worker reconciliation was not propagated to the Shoot
level, so the user was not informed about their misconfiguration. Instead the shoot creation failed with a generic worker status machineDeployments has not been updated
.
The user then deleted the cluster without fixing the wrong configuration. Deleting the cluster fails as well, because DeployMachineClasses
is also called when deleting the Worker
:
gardener/extensions/pkg/controller/worker/genericactuator/actuator_delete.go
Lines 50 to 54 in 51b88ca
// Redeploy generated machine classes to update credentials machine-controller-manager used. | |
log.Info("Deploying the machine classes") | |
if err := workerDelegate.DeployMachineClasses(ctx); err != nil { | |
return fmt.Errorf("failed to deploy the machine classes: %w", err) | |
} |
The shoot is now deadlocked and cannot move forward or backwards.
What you expected to happen:
- Errors during
Worker
reconciliation should be propagated up to theShoot
level to inform the user why it failed. - A shoot should not end up in a deadlock when
DeployMachineClasses
fails. It should delete the cluster successfully.
How to reproduce it (as minimally and precisely as possible):
Since this depends on errors in a closed-source extension there are no easy steps to reproduce. Feel free to reach out to me for details.
Anything else we need to know?:
I already started working on a fix for the error propagation to the shoot. We can focus on the deadlock issue here.
Environment:
- Gardener version: 1.118.3
- Kubernetes version (use
kubectl version
): 1.32.5 - Cloud provider or hardware configuration:
- Others: