Skip to content

CFP: Supporting Multi-Cluster Services API #27902

@MrFreezeex

Description

@MrFreezeex

Cilium Feature Proposal

Describe the feature you'd like

This feature proposal is about supporting Multi-Cluster Services API (MCS) or KEP-1645 (https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api). Which would also give an answer to issues regarding DNS with current implementation of Cilium Global Services (i.e. #7070), maybe address this #19701 somehow and adopt a standard way of doing multi cluster services.

(Optional) Describe your proposed solution

There is a number of ways this could be implemented in Cilium. My plan would be to do it with the minimum change required that would work in all the peculiar ways you could configure Cilium (especially kube-proxy free or not), and very much based upon the current construct of the Cilium global/shared services. And if this standard become irrelevant later on and Cilium decide to drop it, it would also means less time lost in process.

So the KEP is mainly based on two new CRDs ServiceImport and ServiceExport. A ServiceImport creates a global service with a local service IP balancing to all the services in remote cluster with a matching ServiceExport. In order to have 0 change in kube-proxy to implement ServiceImport, upstream decided that ServiceImport would create a underlying Kubernetes service with the name derived-$hash (see https://github.com/kubernetes-sigs/mcs-api/blob/master/pkg/controllers/common.go#L42 and https://github.com/kubernetes-sigs/mcs-api/blob/master/pkg/controllers/serviceimport.go#L97). Creating a service with a dummy name helps to prevent answering remote clusters IPs in the cluster.local DNS domain and instead answering those IPs via the clusterset.local domain specified here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-multicluster/1645-multi-cluster-services-api/specification.md (and with a standard implementation that queries ServiceImports and the matching EndpointSlices here: https://github.com/coredns/multicluster).

So if Cilium were to implement this the same way that mcs-api is suggesting (not enforced by the KEP though), we would roughly need the same controllers as mcs-api (https://github.com/kubernetes-sigs/mcs-api/tree/master/pkg/controllers) creating those dummy derived-$hash Kubernetes services when a ServiceImport is created. Then we need to consider that a Service having a ServiceImport or ServiceExport as owner reference is equivalent to a Cilium global service. It would also be shared only if it has a ServiceExport as owner reference. The ServiceExport would also create the derived services with the same selectors as the matching Service. That way we can implement MCS with almost no additional construct to the current Cilium Global/Shared services.

The second thing which is probably slightly more tricky is that in order make use of CoreDNS multicluster plugin we would have to also sync back to Kubernetes the EndpointSlices (one per cluster as per the spec) from the remote clusters. This would probably helps outside of MCS support scope as it has been requested as well for headless services (#7070). I am not sure to what extent syncing EndpointSlice back to Kubernetes API would make Cilium mixing up local and remote endpoint, but I am wild guessing that ignoring endpoints with a certain annotation and/or label (like the one marking the source cluster of the EndpointSlice) and keep relying on Cilium's IPCache should do the trick. But I am not sure if we would actually need to do this or how difficult it would be, this needs a bit more investigation.

Outside of these two things there are a few followups like reporting back the status of the sync in ServiceImport/Export, automatically creating ServiceImport on other clusters when a ServiceExport is created, and probably adding support for ServiceImport in Cilium's GatewayAPI (https://gateway-api.sigs.k8s.io/geps/gep-1748/) which could be implemented simply by swapping the name of the ServiceImport by the derived service before it reaches Envoy. Although those are not strictly necessary for the basic features per se.

Note that I am willing to help and/or implement this.

Alternatives
The KEP does not enforce the implementation so there are a few things that Cilium could not do. For instance, there is no obligation to create the derived Kubernetes Service. Cilium kube-proxy implementation could treat a ServiceImport almost exactly like a Service without creating a derived Service. However only doing this means that we loose the compatibility with clusters relying on kube-proxy and it would most likely a more involved process regarding Cilium code changes.

Also the EndpointSlice creation might become optional in the KEP (kubernetes/enhancements#3950). However not creating EndpointSlice would make the reference implementation of the multicluster CoreDNS plugin useless and might cause problem in some other implementations of GatewayAPI that wants to support ServiceImport. Technically the DNS issue could be solved by creating our own CoreDNS plugins that would connect to Cilium clustermesh-api-server or something similar but it might be slightly more tricky.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentCilium agent related.area/clustermeshRelates to multi-cluster routing functionality in Cilium.area/k8sImpacts the kubernetes API, or kubernetes -> cilium internals translation layers.kind/cfpCilium Feature Proposalkind/featureThis introduces new functionality.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions