clustermesh: switch to surge upgrade strategy. #32999

marseel · 2024-06-10T08:15:31Z

With introduction of Clustermesh support for HA deployment in #31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available.
This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster.

Change default Clustermesh control plane upgrade strategy to use surge strategy

marseel · 2024-06-10T08:15:38Z

/test

marseel · 2024-06-10T08:37:45Z

/test

giorio94

Thanks! The preferred anti-affinity definition needs to be updated to match the expected format.

install/kubernetes/cilium/values.yaml.tmpl

With introduction of Clustermesh support for HA deployment in cilium#31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available. This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>

marseel · 2024-06-10T09:21:39Z

/test

squeed · 2024-06-11T07:04:29Z

Does it make sense to add a PodDisruptionBudget as well?

marseel · 2024-06-11T09:49:29Z

Does it make sense to add a PodDisruptionBudget as well?

yes, makes sense. I would probably do it together with switching the default number of replicas from 1 to 2 and setting maxUnavailable=1 on PodDisruptionBudget. There are also some nouances with evicting unhealthy pods:

It is recommended to set AlwaysAllow Unhealthy Pod Eviction Policy to your PodDisruptionBudgets to support eviction of misbehaving applications during a node drain. The default behavior is to wait for the application pods to become healthy before the drain can proceed.

but Unhealthy Pod Eviction Policy is beta in v1.27. So I would prefer to do it after considering all these edge cases and maybe waiting for Unhealthy Pod Eviction Policy to become stable.

marseel requested a review from giorio94 June 10, 2024 08:15

marseel requested review from a team as code owners June 10, 2024 08:15

marseel requested review from nathanjsweet and squeed June 10, 2024 08:15

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 10, 2024

marseel force-pushed the set_maxunavailable_clustermesh branch from 3bc9e67 to 7ac6d79 Compare June 10, 2024 08:30

marseel added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jun 10, 2024

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 10, 2024

giorio94 approved these changes Jun 10, 2024

View reviewed changes

install/kubernetes/cilium/values.yaml.tmpl Show resolved Hide resolved

giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/helm Impacts helm charts and user deployment experience labels Jun 10, 2024

marseel force-pushed the set_maxunavailable_clustermesh branch from 7ac6d79 to c9bad71 Compare June 10, 2024 09:14

squeed approved these changes Jun 11, 2024

View reviewed changes

aanm approved these changes Jun 13, 2024

View reviewed changes

aanm added this pull request to the merge queue Jun 13, 2024

Merged via the queue into cilium:main with commit dc52072 Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

clustermesh: switch to surge upgrade strategy. #32999

clustermesh: switch to surge upgrade strategy. #32999

Uh oh!

marseel commented Jun 10, 2024

Uh oh!

marseel commented Jun 10, 2024

Uh oh!

marseel commented Jun 10, 2024

Uh oh!

giorio94 left a comment

Uh oh!

Uh oh!

marseel commented Jun 10, 2024

Uh oh!

squeed commented Jun 11, 2024

Uh oh!

marseel commented Jun 11, 2024

Uh oh!

Uh oh!

clustermesh: switch to surge upgrade strategy. #32999

clustermesh: switch to surge upgrade strategy. #32999

Uh oh!

Conversation

marseel commented Jun 10, 2024

Uh oh!

marseel commented Jun 10, 2024

Uh oh!

marseel commented Jun 10, 2024

Uh oh!

giorio94 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marseel commented Jun 10, 2024

Uh oh!

squeed commented Jun 11, 2024

Uh oh!

marseel commented Jun 11, 2024

Uh oh!

Uh oh!