-
Notifications
You must be signed in to change notification settings - Fork 526
Description
How to categorize this issue?
/area robustness
/kind bug
What happened:
gardener-node-agent
updates its own binary and its configuration. Usually the config changes are applied before the binary is updated because of their sequence in OperatingSystemConfig.
If there are breaking changes in gardener-node-agent
(like adding a feature gate) it might end in a crash-loop in the following case.
- GNA saves its new configuration to disk. This configuration includes the activation of a new feature gate (
NodeAgentAuthorizer
in the concrete case). - Pulling the new GNA version fails, so the previous version is still on the disk.
- GNA is restarted.
In this case the configuration of GNA already includes the feature gate parameter while the old GNA binary does not know it and refuses to start. Manual intervention is required to solve this problem.
What you expected to happen:
gardener-node-agent
should be resilient in this update case.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The issue could be solved by adding a version suffix to the GNA config files and let GNA load configs of its own version only.
Environment:
- Gardener version: v1.109.0
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration:
- Others: