-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.15.11 and lower than v1.16.0
What happened?
We run cilium with KVStoreMesh and KVstore identity mode and sync data from remote kubernetes clusters in etcd. We observed that when syncing data from a cluster with around 1k identities, there are about 2k keys synced from the cilium/state/identities/v1
path in the source cluster. This is because both the identities id
and value
keys are getting synced (cilium/state/identities/v1/id
and cilium/state/identities/v1/value
).
The value
keys are used to track which node uses which identity which seems to only be used for garbage collection, which only happens on the source cluster, since synced clusters rely on a TTL based garbage collection. Hence syncing those value
keys does not appear to be necessary.
We ran some tests with a patched version of the kvstoremesh API that only syncs id
identity keys and got the following results:
Before applying the patch:
- Source cluster
cluster-a
:
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1 2>/dev/null | wc -l
2789
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/value 2>/dev/null | wc -l
1694
- Target cluster
cluster-b
:
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a 2>/dev/null | wc -l
2788
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/value 2>/dev/null | wc -l
1690
After deploying the patch to only sync id
identity keys:
- Source cluster
cluster-a
:
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1 2>/dev/null | wc -l
2787
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/value 2>/dev/null | wc -l
1690
- Target cluster
cluster-b
:
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/value 2>/dev/null | wc -l
0
The following graph shows the cilium_kvstoremesh_kvstore_sync_queue_size
metric for the identities/v1
scope with two spikes, the first one without the patch with a max queue size spike of about 2500, and the second one after applying the patch with a max queue size spike of about 800.
This patch reduces the number of keys synced from a source cluster to a target cluster by about 50% which can add up to a significant number in large meshes with many clusters as all clusters sync their identities with every other cluster in the mesh. This leads to reduced load on the etcd and reduced sync queue size.
How can we reproduce the issue?
See section above.
Cilium Version
# cilium version
Client: 1.15.10 7aff1a4d87 2024-07-30T13:05:24+00:00 go version go1.22.8 linux/amd64
Daemon: 1.15.10 7aff1a4d87 2024-07-30T13:05:24+00:00 go version go1.22.8 linux/amd64
Kernel Version
# uname -a
Linux 6.8.0-1019-aws #21~22.04.1-Ubuntu SMP Thu Nov 7 17:33:30 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
$ kubectl version
Client Version: v1.31.3
Kustomize Version: v5.4.2
Server Version: v1.31.2
Regression
No response
Sysdump
No response
Relevant log output
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct