clustermesh511 upgrade testing #27674

thorn3r · 2023-08-23T20:46:39Z

this PR is for testing the new CI workflow being introduced in #27232 against the #27520. This branch is the feature branch rebased on this branch which fixes a failure for the tests that exists in main.

This should not be merged and will be closed after testing.

Propagate extra options from the IPIdentityWatcher to the underlying WatchStore, to allow configuring additional callback handlers executed after the first synchronization, or register a metric tracking the number of elements. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Add the possibility to specify an additional callback handler executed when the initial list of identities has been retrieved from a given remote kvstore. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Generalize the logic currently used to wait until the initial list of shared services has been received from all remote clusters, and synchronized with the BPF datapath, to prepare for the subsequent introduction of similar methods for other resources. While being at it, let's also rename the function to clarify its scope (as it only waits for the synchronization of services), and fix the possibility of a deadlock in case a remote cluster is disconnected before that the synchronization has completed. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Add a new IPIdentitiesSynced method to the clustermesh object to allow waiting until the initial list of ipcache entries and identities has been received from all known remote clusters. This mimics the ServicesSynced() method, and enables to postpone disruptive operations to prevent dropping valid long-running connections on agent restart. Differently from the services case, we don't need to separately track the propagation of the single entries to the datapath, as performed synchronously. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Add a new NodesSynced method to the clustermesh object to allow waiting until the initial list of nodes has been received from all known remote clusters. This mimics the ServicesSynced() method, and enables to postpone disruptive operations to prevent dropping valid long-running connections on agent restart. Differently from the services case, we don't need to separately track the propagation of the single entries to the datapath, as performed synchronously. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

A new ipcache map is recreated from scratch and populated by each cilium agent upon restart, and the existing endpoints are switched to use it during the regeneration process of the endpoint itself. Before performing this operation, the new ipcache map needs to have already been fully synchronized, otherwise we might drop long running connections. Currently we wait for synchronization from k8s (when in CRD mode) and the kvstore (when in kvstore mode). Yet, when clustermesh is enabled, we don't wait for synchronization from all the remote clusters, possibly disrupting existing cross-cluster connections. The same also applies to identities in case network policies targeting the given endpoint are present. This commit introduces additional logic to wait for ipcache and identities synchronization from remote clusters before triggering endpoint regeneration. The wait period is bound by a user-configurable timeout, defaulting to one minute. The idea being that we don't block the local endpoints regeneration for an extended period of time if a given remote cluster is not ready. This also prevents possible inter dependency problems, with the regeneration in one cluster being required to allow other clusters to successfully connect to it. Nonetheless, the timeout can be configured through a dedicated --clustermesh-ip-identities-sync-timeout flag, to select the desired trade-off between possible endpoints regeneration delay and likelihood of connection drops if a given remote clusters is not ready (e.g., because the corresponding clustermesh-apiserver is restarting). A special case is represented by the host endpoint if IPSec is enabled, because it is currently restored synchronously to ensure ordering. Given that blocking this operation prevents the agent from becoming ready, and in turn new pods from being scheduled, we forcefully cap the wait timeout to 5 seconds. Empirically, this value has proved to be sufficient to retrieve a high number of entries if the remote kvstore is ready (e.g., the clustermesh-apiserver pod is running, and already synchronized with Kubernetes). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Currently, the WireGuard subsystem triggers the deletion of obsolete peers when the daemon restoration process has completed. Yet, at this point the information about nodes belonging to remote clusters might not yet have been received, causing the disruption of cross-cluster connections. Hence, let's first wait for the synchronization of all remote nodes before kicking off the removal process. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

This commit is to change to kube_codegen.sh as per the suggestion in the output. ``` $ make generate-k8s-api ... WARNING: generate-internal-groups.sh is deprecated. WARNING: Please use k8s.io/code-generator/kube_codegen.sh instead. .. ``` Relates: kubernetes/code-generator@1305626 Signed-off-by: Tam Mach <tam.mach@cilium.io>

This is to avoid the below error if proxy is disabled as part of installation, but the visibility annotation is added to pods later. ``` 2023-08-11T12:42:41.390575371Z goroutine 1522 [running]: 2023-08-11T12:42:41.390581994Z github.com/cilium/cilium/pkg/proxy.(*Proxy).CreateOrUpdateRedirect(0x0, {0x3ae74b8, 0xc000650280}, {0x3aeb440?, 0xc0016e4560?}, {0xc0010ba1b0, 0x12}, {0x3afd260, 0xc000982000}, 0xc001c9c980) 2023-08-11T12:42:41.390589198Z github.com/cilium/cilium/pkg/proxy/proxy.go:459 +0xb7 2023-08-11T12:42:41.390596081Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).addVisibilityRedirects(0xc000982000, 0x1, 0xc001874ba0?, 0xc001c9c980?) 2023-08-11T12:42:41.390602613Z github.com/cilium/cilium/pkg/endpoint/bpf.go:343 +0x443 2023-08-11T12:42:41.390614345Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).addNewRedirects(0xc000982000, 0xc001874870?) 2023-08-11T12:42:41.390651325Z github.com/cilium/cilium/pkg/endpoint/bpf.go:424 +0x3c5 2023-08-11T12:42:41.390658068Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(0xc000982000, 0xc000e31800, 0x0) 2023-08-11T12:42:41.390663999Z github.com/cilium/cilium/pkg/endpoint/bpf.go:802 +0x4fe 2023-08-11T12:42:41.390669690Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc000982000, 0xc000e31800) 2023-08-11T12:42:41.390675451Z github.com/cilium/cilium/pkg/endpoint/bpf.go:542 +0x1a5 2023-08-11T12:42:41.390681112Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc000982000, 0xc000e31800) 2023-08-11T12:42:41.390686893Z github.com/cilium/cilium/pkg/endpoint/policy.go:467 +0x9c6 2023-08-11T12:42:41.390692564Z github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc000131b50, 0x0?) 2023-08-11T12:42:41.390698385Z github.com/cilium/cilium/pkg/endpoint/events.go:53 +0x325 2023-08-11T12:42:41.390704045Z github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1() 2023-08-11T12:42:41.390709716Z github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:245 +0x142 ``` Fixes: cilium#27594 Signed-off-by: Tam Mach <tam.mach@cilium.io>

This is to ensure that proxy must be enabled if Envoy L7 Load balancer feature is enabled. Signed-off-by: Tam Mach <tam.mach@cilium.io>

Currently, the envoy accesslog server depends on the xDS server to get information about local endpoints. This leads to the xDS server being initialized and started before the accesslog server. Therefore, the accesslog server might not be ready when Envoy is receiving the xDS resources and starts to initialize Cilium components in envoy. This results in silent errors. This commit introduces the LocalEndPointStore which contains this information and is shared between the two components. The direct dependency between the accesslog server and xDS server can be replaced - and the components start initializing at the same time. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

In addition to introducing the LocalEndpointStore to get rid of the direct dependency between accesslog and xDS server, the dependency from the xDS server to the accesslog server gets introduced. This dependency only serves the purpose of enforcing the accesslog server being initialized first before the xDS server is started up. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

This commit introduces a log message once the Envoy access log server starts to listen for incoming connections. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

Extend the cluster_id field of current LXC map key (struct endpoint_key/EndpointKey) from uint8 to uint16 to support larger Clustermesh. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

Extend the cluster_id field of the current tunnel map key (struct tunnel_key/TunnelKey) from uint8 to uint16 to support larger Clustermesh. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

Extend the cluster_id field of the current ipcache map key (struct ipcache_key/Key) from uint8 to uint16 to support larger Clustermesh. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

Extend the cluster_id field of the current values for lb4_backends and lb6_backends maps (structs lb4_backend/Backend4ValueV3 and lb6_backend/Backend6ValueV4) to support larger Clustermesh. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

This commit adds a new flag '--max-connected-clusters' to enable users to configure the bit allocation of cluster ID and identity in a numeric identity to enable support for a clustermesh with more than 255 clusters. Currently the only supported values for this flag are 255 (default) and 511. Increasing this value will increase the maximum number of clusters/maximum cluster ID by reducing the maximum number of allocatable cluster-local identities. This feature changes the way identites are interpreted in the datapath. It should only be used for new clusters, as there is currently no migration path for live clusters. All clusters connected in a clustermesh will require this flag to be set to the same value. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

This commit adds a new flag '--max-connected-clusters' to clustermesh-apiserver. The value will be written to the CiliumClusterConfig and is used by remote-clusters to validate that both clusters are running in the same identity/clusterID allocation mode prior to synchronizing resources. Clusters that are using different values for '--max-connected-clusters' should never be permitted to connect to each other, as this determines how identity is parsed by datapath BPF programs. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

This commit adds a new flag '--max-connected-clusters' to kvstoremesh-operator. The value will be written to the CiliumClusterConfig and is used by remote-clusters to validate that both clusters are running in the same identity/clusterID allocation mode prior to synchronizing resources. Clusters that are using different values for '--max-connected-clusters' should never be permitted to connect to each other, as this determines how identity is parsed by datapath BPF programs. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

With the addition of MaxConnectedClusters to CiliumClusterConfig, the operator's KVStore watchdog must be aware of and set this value to ensure it is not removed on update after updating the heartbeat. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

Extended clustermesh changes the bit allocation in skb mark to repurpose bits from identity and grant them to cluster ID. Users can configure this with the agent flag '--max-connected-clusters'. This enables support for larger clustermesh/greater cluster ID but reduces the maximum allocatable identity. This commit introduces a new cluster_config.h header file that is used to set cluster-wide configuration options in the datapath. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

Set maxConnectedClusters in ci-multicluster to test extended clustermesh feature. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

Setting Cluster ID in the upgrade-downgrade test will provide test coverage for the updates made to BPF maps and the datapath to provide support for extended clustermesh. These changes include expanding the Cluster ID field on lxc, endpoint, lb4backend, and lb6backend maps, as well as modifications to skb->mark parsing for identity. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

thorn3r · 2023-08-24T15:34:35Z

passing run: https://github.com/cilium/cilium/actions/runs/5956377171

giorio94 and others added 25 commits August 23, 2023 10:22

identities: add on-sync callbacks to remote kvstore watcher

e3c4deb

Add the possibility to specify an additional callback handler executed when the initial list of identities has been retrieved from a given remote kvstore. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

helm: Add validation rule for Envoy L7 Load Balancer

2208639

This is to ensure that proxy must be enabled if Envoy L7 Load balancer feature is enabled. Signed-off-by: Tam Mach <tam.mach@cilium.io>

envoy: log access log server listening start

70522bb

This commit introduces a log message once the Envoy access log server starts to listen for incoming connections. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

lxc map: extend cluster_id to uint16

77a3225

Extend the cluster_id field of current LXC map key (struct endpoint_key/EndpointKey) from uint8 to uint16 to support larger Clustermesh. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

tunnelmap: extend cluster_id to uint16

9ce0c1c

Extend the cluster_id field of the current tunnel map key (struct tunnel_key/TunnelKey) from uint8 to uint16 to support larger Clustermesh. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

ipcache map: extend cluster_id to uint16

4466118

Extend the cluster_id field of the current ipcache map key (struct ipcache_key/Key) from uint8 to uint16 to support larger Clustermesh. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

helm: update charts to include maxConnectedClusters option

15dbc10

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

workflow/clustermesh: set maxConnectedClusters

c93eb32

Set maxConnectedClusters in ci-multicluster to test extended clustermesh feature. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 23, 2023

thorn3r mentioned this pull request Aug 23, 2023

Extended Clustermesh (Clustermesh511) #27520

Merged

thorn3r closed this Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

clustermesh511 upgrade testing #27674

clustermesh511 upgrade testing #27674

Uh oh!

thorn3r commented Aug 23, 2023

Uh oh!

thorn3r commented Aug 24, 2023

Uh oh!

Uh oh!

clustermesh511 upgrade testing #27674

clustermesh511 upgrade testing #27674

Uh oh!

Conversation

thorn3r commented Aug 23, 2023

Uh oh!

thorn3r commented Aug 24, 2023

Uh oh!

Uh oh!