Skip to content

gardener-node-agent might end in a crash-loop in case of breaking changes affecting its own configuration #11025

@oliver-goetz

Description

@oliver-goetz

How to categorize this issue?

/area robustness
/kind bug

What happened:
gardener-node-agent updates its own binary and its configuration. Usually the config changes are applied before the binary is updated because of their sequence in OperatingSystemConfig.

If there are breaking changes in gardener-node-agent (like adding a feature gate) it might end in a crash-loop in the following case.

  1. GNA saves its new configuration to disk. This configuration includes the activation of a new feature gate (NodeAgentAuthorizer in the concrete case).
  2. Pulling the new GNA version fails, so the previous version is still on the disk.
  3. GNA is restarted.

In this case the configuration of GNA already includes the feature gate parameter while the old GNA binary does not know it and refuses to start. Manual intervention is required to solve this problem.

What you expected to happen:
gardener-node-agent should be resilient in this update case.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
The issue could be solved by adding a version suffix to the GNA config files and let GNA load configs of its own version only.

Environment:

  • Gardener version: v1.109.0
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

Metadata

Metadata

Assignees

Labels

area/robustnessRobustness, reliability, resilience relatedkind/bugBugtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions