Skip to content

Provide Operational Emergency Switch to Prevent Propagation of Faulty Updates #12528

@ScheererJ

Description

@ScheererJ

How to categorize this issue?

/area ops-productivity
/area robustness
/kind enhancement

What would you like to be added:

I would like to have the functionality added to Gardener that an update can be effectively stopped in an easy manner from propagating to shoot clusters. For example, setting an annotation on the Garden resource could prevent further shoot reconciliations from starting.

Why is this needed:

It is a fact of software development that software has bugs. In case a faulty update of a component slips through all the quality assurance measures it should be possible to stop the issue from propagating further. In context of Gardener this means that no further shoot cluster reconciliations should start. While it is possible to achieve this by scaling down gardenlet on all seed clusters, this can be a high operational effort in bigger landscapes. Furthermore, there may be side effects of this approach, e.g. gardener-controller-manager marking shoot clusters as Unknown due to missing updates from gardenlet.

Therefore, it might be helpful to all landscape operators to easily disable further shoot reconciliations by setting an annotation or a field in the Garden resource. gardenlet could check this flag before trying to reconcile a shoot cluster.

/cc @rfranzke @ashwani2k @vlerenc @dguendisch @hendrikKahl

Metadata

Metadata

Assignees

Labels

area/ops-productivityOperator productivity related (how to improve operations)area/robustnessRobustness, reliability, resilience relatedkind/enhancementEnhancement, improvement, extension

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions