-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Cilium Feature Proposal
Describe the feature you'd like
This feature proposal is about supporting Multi-Cluster Services API (MCS) or KEP-1645 (https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api). Which would also give an answer to issues regarding DNS with current implementation of Cilium Global Services (i.e. #7070), maybe address this #19701 somehow and adopt a standard way of doing multi cluster services.
(Optional) Describe your proposed solution
There is a number of ways this could be implemented in Cilium. My plan would be to do it with the minimum change required that would work in all the peculiar ways you could configure Cilium (especially kube-proxy free or not), and very much based upon the current construct of the Cilium global/shared services. And if this standard become irrelevant later on and Cilium decide to drop it, it would also means less time lost in process.
So the KEP is mainly based on two new CRDs ServiceImport
and ServiceExport
. A ServiceImport
creates a global service with a local service IP balancing to all the services in remote cluster with a matching ServiceExport
. In order to have 0 change in kube-proxy to implement ServiceImport
, upstream decided that ServiceImport
would create a underlying Kubernetes service with the name derived-$hash
(see https://github.com/kubernetes-sigs/mcs-api/blob/master/pkg/controllers/common.go#L42 and https://github.com/kubernetes-sigs/mcs-api/blob/master/pkg/controllers/serviceimport.go#L97). Creating a service with a dummy name helps to prevent answering remote clusters IPs in the cluster.local
DNS domain and instead answering those IPs via the clusterset.local
domain specified here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-multicluster/1645-multi-cluster-services-api/specification.md (and with a standard implementation that queries ServiceImports
and the matching EndpointSlices
here: https://github.com/coredns/multicluster).
So if Cilium were to implement this the same way that mcs-api is suggesting (not enforced by the KEP though), we would roughly need the same controllers as mcs-api (https://github.com/kubernetes-sigs/mcs-api/tree/master/pkg/controllers) creating those dummy derived-$hash
Kubernetes services when a ServiceImport
is created. Then we need to consider that a Service having a ServiceImport
or ServiceExport
as owner reference is equivalent to a Cilium global service. It would also be shared
only if it has a ServiceExport
as owner reference. The ServiceExport
would also create the derived services with the same selectors as the matching Service. That way we can implement MCS with almost no additional construct to the current Cilium Global/Shared services.
The second thing which is probably slightly more tricky is that in order make use of CoreDNS multicluster plugin we would have to also sync back to Kubernetes the EndpointSlices
(one per cluster as per the spec) from the remote clusters. This would probably helps outside of MCS support scope as it has been requested as well for headless services (#7070). I am not sure to what extent syncing EndpointSlice
back to Kubernetes API would make Cilium mixing up local and remote endpoint, but I am wild guessing that ignoring endpoints with a certain annotation and/or label (like the one marking the source cluster of the EndpointSlice) and keep relying on Cilium's IPCache should do the trick. But I am not sure if we would actually need to do this or how difficult it would be, this needs a bit more investigation.
Outside of these two things there are a few followups like reporting back the status of the sync in ServiceImport/Export
, automatically creating ServiceImport
on other clusters when a ServiceExport
is created, and probably adding support for ServiceImport
in Cilium's GatewayAPI (https://gateway-api.sigs.k8s.io/geps/gep-1748/) which could be implemented simply by swapping the name of the ServiceImport
by the derived service before it reaches Envoy. Although those are not strictly necessary for the basic features per se.
Note that I am willing to help and/or implement this.
Alternatives
The KEP does not enforce the implementation so there are a few things that Cilium could not do. For instance, there is no obligation to create the derived Kubernetes Service. Cilium kube-proxy implementation could treat a ServiceImport
almost exactly like a Service
without creating a derived Service
. However only doing this means that we loose the compatibility with clusters relying on kube-proxy and it would most likely a more involved process regarding Cilium code changes.
Also the EndpointSlice
creation might become optional in the KEP (kubernetes/enhancements#3950). However not creating EndpointSlice
would make the reference implementation of the multicluster CoreDNS plugin useless and might cause problem in some other implementations of GatewayAPI that wants to support ServiceImport
. Technically the DNS issue could be solved by creating our own CoreDNS plugins that would connect to Cilium clustermesh-api-server or something similar but it might be slightly more tricky.