Skip to content

Conversation

thorn3r
Copy link
Contributor

@thorn3r thorn3r commented Aug 23, 2023

this PR is for testing the new CI workflow being introduced in #27232 against the #27520. This branch is the feature branch rebased on this branch which fixes a failure for the tests that exists in main.

This should not be merged and will be closed after testing.

giorio94 and others added 25 commits August 23, 2023 10:22
Propagate extra options from the IPIdentityWatcher to the underlying
WatchStore, to allow configuring additional callback handlers executed
after the first synchronization, or register a metric tracking the
number of elements.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Add the possibility to specify an additional callback handler executed
when the initial list of identities has been retrieved from a given
remote kvstore.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Generalize the logic currently used to wait until the initial list of
shared services has been received from all remote clusters, and
synchronized with the BPF datapath, to prepare for the subsequent
introduction of similar methods for other resources.

While being at it, let's also rename the function to clarify its scope
(as it only waits for the synchronization of services), and fix the
possibility of a deadlock in case a remote cluster is disconnected
before that the synchronization has completed.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Add a new IPIdentitiesSynced method to the clustermesh object to allow
waiting until the initial list of ipcache entries and identities has
been received from all known remote clusters. This mimics the
ServicesSynced() method, and enables to postpone disruptive operations
to prevent dropping valid long-running connections on agent restart.

Differently from the services case, we don't need to separately
track the propagation of the single entries to the datapath, as
performed synchronously.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Add a new NodesSynced method to the clustermesh object to allow
waiting until the initial list of nodes has been received from
all known remote clusters. This mimics the ServicesSynced()
method, and enables to postpone disruptive operations to prevent
dropping valid long-running connections on agent restart.

Differently from the services case, we don't need to separately
track the propagation of the single entries to the datapath, as
performed synchronously.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
A new ipcache map is recreated from scratch and populated by each cilium
agent upon restart, and the existing endpoints are switched to use it
during the regeneration process of the endpoint itself.

Before performing this operation, the new ipcache map needs to have
already been fully synchronized, otherwise we might drop long running
connections. Currently we wait for synchronization from k8s (when in CRD
mode) and the kvstore (when in kvstore mode). Yet, when clustermesh is
enabled, we don't wait for synchronization from all the remote clusters,
possibly disrupting existing cross-cluster connections. The same also
applies to identities in case network policies targeting the given
endpoint are present.

This commit introduces additional logic to wait for ipcache and
identities synchronization from remote clusters before triggering endpoint
regeneration. The wait period is bound by a user-configurable timeout,
defaulting to one minute. The idea being that we don't block the
local endpoints regeneration for an extended period of time if a
given remote cluster is not ready. This also prevents possible inter
dependency problems, with the regeneration in one cluster being
required to allow other clusters to successfully connect to it.

Nonetheless, the timeout can be configured through a dedicated
--clustermesh-ip-identities-sync-timeout flag, to select the desired
trade-off between possible endpoints regeneration delay and likelihood
of connection drops if a given remote clusters is not ready (e.g.,
because the corresponding clustermesh-apiserver is restarting).

A special case is represented by the host endpoint if IPSec is enabled,
because it is currently restored synchronously to ensure ordering.
Given that blocking this operation prevents the agent from becoming
ready, and in turn new pods from being scheduled, we forcefully
cap the wait timeout to 5 seconds. Empirically, this value has
proved to be sufficient to retrieve a high number of entries
if the remote kvstore is ready (e.g., the clustermesh-apiserver
pod is running, and already synchronized with Kubernetes).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the WireGuard subsystem triggers the deletion of obsolete
peers when the daemon restoration process has completed. Yet, at
this point the information about nodes belonging to remote clusters
might not yet have been received, causing the disruption of cross-cluster
connections. Hence, let's first wait for the synchronization of all
remote nodes before kicking off the removal process.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit is to change to kube_codegen.sh as per the suggestion in the
output.

```
$ make generate-k8s-api
...
WARNING: generate-internal-groups.sh is deprecated.
WARNING: Please use k8s.io/code-generator/kube_codegen.sh instead.
..
```

Relates: kubernetes/code-generator@1305626

Signed-off-by: Tam Mach <tam.mach@cilium.io>
This is to avoid the below error if proxy is disabled as part of
installation, but the visibility annotation is added to pods later.

```
2023-08-11T12:42:41.390575371Z goroutine 1522 [running]:
2023-08-11T12:42:41.390581994Z github.com/cilium/cilium/pkg/proxy.(*Proxy).CreateOrUpdateRedirect(0x0, {0x3ae74b8, 0xc000650280}, {0x3aeb440?, 0xc0016e4560?}, {0xc0010ba1b0, 0x12}, {0x3afd260, 0xc000982000}, 0xc001c9c980)
2023-08-11T12:42:41.390589198Z  github.com/cilium/cilium/pkg/proxy/proxy.go:459 +0xb7
2023-08-11T12:42:41.390596081Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).addVisibilityRedirects(0xc000982000, 0x1, 0xc001874ba0?, 0xc001c9c980?)
2023-08-11T12:42:41.390602613Z  github.com/cilium/cilium/pkg/endpoint/bpf.go:343 +0x443
2023-08-11T12:42:41.390614345Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).addNewRedirects(0xc000982000, 0xc001874870?)
2023-08-11T12:42:41.390651325Z  github.com/cilium/cilium/pkg/endpoint/bpf.go:424 +0x3c5
2023-08-11T12:42:41.390658068Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(0xc000982000, 0xc000e31800, 0x0)
2023-08-11T12:42:41.390663999Z  github.com/cilium/cilium/pkg/endpoint/bpf.go:802 +0x4fe
2023-08-11T12:42:41.390669690Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc000982000, 0xc000e31800)
2023-08-11T12:42:41.390675451Z  github.com/cilium/cilium/pkg/endpoint/bpf.go:542 +0x1a5
2023-08-11T12:42:41.390681112Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc000982000, 0xc000e31800)
2023-08-11T12:42:41.390686893Z  github.com/cilium/cilium/pkg/endpoint/policy.go:467 +0x9c6
2023-08-11T12:42:41.390692564Z github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc000131b50, 0x0?)
2023-08-11T12:42:41.390698385Z  github.com/cilium/cilium/pkg/endpoint/events.go:53 +0x325
2023-08-11T12:42:41.390704045Z github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1()
2023-08-11T12:42:41.390709716Z  github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:245 +0x142
```

Fixes: cilium#27594

Signed-off-by: Tam Mach <tam.mach@cilium.io>
This is to ensure that proxy must be enabled if Envoy L7 Load balancer
feature is enabled.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Currently, the envoy accesslog server depends on the xDS server to get
information about local endpoints. This leads to the xDS server being
initialized and started before the accesslog server. Therefore, the
accesslog server might not be ready when Envoy is receiving the xDS
resources and starts to initialize Cilium components in envoy. This
results in silent errors.

This commit introduces the LocalEndPointStore which contains this
information and is shared between the two components. The direct
dependency between the accesslog server and xDS server can be replaced -
and the components start initializing at the same time.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
In addition to introducing the LocalEndpointStore to get rid of the
direct dependency between accesslog and xDS server, the dependency from
the xDS server to the accesslog server gets introduced.

This dependency only serves the purpose of enforcing the accesslog
server being initialized first before the xDS server is started up.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
This commit introduces a log message once the Envoy access log server
starts to listen for incoming connections.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Extend the cluster_id field of current LXC map key (struct
endpoint_key/EndpointKey) from uint8 to uint16 to support larger
Clustermesh.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Extend the cluster_id field of the current tunnel map key (struct
tunnel_key/TunnelKey) from uint8 to uint16 to support larger
Clustermesh.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Extend the cluster_id field of the current ipcache map key (struct
ipcache_key/Key) from uint8 to uint16 to support larger Clustermesh.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Extend the cluster_id field of the current values for lb4_backends and
lb6_backends maps (structs lb4_backend/Backend4ValueV3 and
lb6_backend/Backend6ValueV4) to support larger Clustermesh.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
This commit adds a new flag '--max-connected-clusters' to enable
users to configure the bit allocation of cluster ID and identity
in a numeric identity to enable support for a clustermesh with
more than 255 clusters.

Currently the only supported values for this flag are 255 (default)
and 511. Increasing this value will increase the maximum number of
clusters/maximum cluster ID by reducing the maximum number of
allocatable cluster-local identities.

This feature changes the way identites are interpreted in the datapath.
It should only be used for new clusters, as there is currently
no migration path for live clusters. All clusters connected in a
clustermesh will require this flag to be set to the same value.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
This commit adds a new flag '--max-connected-clusters' to
clustermesh-apiserver. The value will be written to the
CiliumClusterConfig and is used by remote-clusters to validate that both
clusters are running in the same identity/clusterID allocation mode
prior to synchronizing resources.

Clusters that are using different values for '--max-connected-clusters'
should never be permitted to connect to each other, as this determines
how identity is parsed by datapath BPF programs.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
This commit adds a new flag '--max-connected-clusters' to
kvstoremesh-operator. The value will be written to the
CiliumClusterConfig and is used by remote-clusters to validate that
both clusters are running in the same identity/clusterID allocation
mode prior to synchronizing resources.

Clusters that are using different values for '--max-connected-clusters'
should never be permitted to connect to each other, as this determines
how identity is parsed by datapath BPF programs.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
With the addition of MaxConnectedClusters to CiliumClusterConfig, the
operator's KVStore watchdog must be aware of and set this value to ensure
it is not removed on update after updating the heartbeat.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Extended clustermesh changes the bit allocation in skb mark to repurpose
bits from identity and grant them to cluster ID. Users can configure
this with the agent flag '--max-connected-clusters'.

This enables support for larger clustermesh/greater cluster ID but
reduces the maximum allocatable identity.

This commit introduces a new cluster_config.h header file that is
used to set cluster-wide configuration options in the datapath.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Set maxConnectedClusters in ci-multicluster to test extended clustermesh feature.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Setting Cluster ID in the upgrade-downgrade test will provide test
coverage for the updates made to BPF maps and the datapath to provide
support for extended clustermesh. These changes include expanding the
Cluster ID field on lxc, endpoint, lb4backend, and lb6backend maps, as
well as modifications to skb->mark parsing for identity.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 23, 2023
@thorn3r
Copy link
Contributor Author

thorn3r commented Aug 24, 2023

@thorn3r thorn3r closed this Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/needs-release-note-label The author needs to describe the release impact of these changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants