-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for multiple clustermesh-apiserver replicas (ClusterMesh HA) #31677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test |
0ea2d4e
to
b855a05
Compare
/test |
b855a05
to
660dc0f
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Just a bunch of minor comments and nits inline.
660dc0f
to
ee4fb0a
Compare
/test |
Yep, I typically just add them above the entire command, to avoid these kinds of problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Marking for backport to v1.15 to address #30964. I'm going to backport a reduced version which only includes the configuration of the unique etcd Cluster ID and the interceptor logic, fixing a bug potentially causing Cilium agents to incorrectly restart an etcd watch against a different clustermesh-apiserver instance. |
With introduction of Clustermesh support for HA deployment in cilium#31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available. This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
With introduction of Clustermesh support for HA deployment in cilium#31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available. This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
With introduction of Clustermesh support for HA deployment in cilium#31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available. This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
With introduction of Clustermesh support for HA deployment in #31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available. This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
With introduction of Clustermesh support for HA deployment in cilium#31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available. This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
This adds support for running clustermesh-apiserver deployments with multiple replicas for high availability.
Each clustermesh-apiserver pod runs its own etcd cluster. Depending on configuration, either the Cilium Agent or KVStoreMesh instance watches etcd in a remote cluster. All responses from the remote etcd cluster are intercepted and the header is inspected to retrieve the etcd cluster ID. If a failover event occurs and the cluster ID has changed, the remote connection is restarted to ensure that no events are missed and that no invalid data is retained. See individual commit messages for additional details.