Skip to content

KVStoreMesh: only sync identities ID #36425

@HadrienPatte

Description

@HadrienPatte

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.15.11 and lower than v1.16.0

What happened?

We run cilium with KVStoreMesh and KVstore identity mode and sync data from remote kubernetes clusters in etcd. We observed that when syncing data from a cluster with around 1k identities, there are about 2k keys synced from the cilium/state/identities/v1 path in the source cluster. This is because both the identities id and value keys are getting synced (cilium/state/identities/v1/id and cilium/state/identities/v1/value).

The value keys are used to track which node uses which identity which seems to only be used for garbage collection, which only happens on the source cluster, since synced clusters rely on a TTL based garbage collection. Hence syncing those value keys does not appear to be necessary.

We ran some tests with a patched version of the kvstoremesh API that only syncs id identity keys and got the following results:

Before applying the patch:

  • Source cluster cluster-a:
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1 2>/dev/null | wc -l
2789
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/value 2>/dev/null | wc -l
1694
  • Target cluster cluster-b:
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a 2>/dev/null | wc -l
2788
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/value 2>/dev/null | wc -l
1690

After deploying the patch to only sync id identity keys:

  • Source cluster cluster-a:
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1 2>/dev/null | wc -l
2787
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-a:/home/cilium# cilium kvstore get --recursive cilium/state/identities/v1/value 2>/dev/null | wc -l
1690
  • Target cluster cluster-b:
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/id 2>/dev/null | wc -l
1097
root@cilium-agent-cluster-b:/home/cilium# cilium kvstore get --recursive cilium/cache/identities/v1/cluster-a/value 2>/dev/null | wc -l
0

The following graph shows the cilium_kvstoremesh_kvstore_sync_queue_size metric for the identities/v1 scope with two spikes, the first one without the patch with a max queue size spike of about 2500, and the second one after applying the patch with a max queue size spike of about 800.

Image

This patch reduces the number of keys synced from a source cluster to a target cluster by about 50% which can add up to a significant number in large meshes with many clusters as all clusters sync their identities with every other cluster in the mesh. This leads to reduced load on the etcd and reduced sync queue size.

How can we reproduce the issue?

See section above.

Cilium Version

# cilium version
Client: 1.15.10 7aff1a4d87 2024-07-30T13:05:24+00:00 go version go1.22.8 linux/amd64
Daemon: 1.15.10 7aff1a4d87 2024-07-30T13:05:24+00:00 go version go1.22.8 linux/amd64

Kernel Version

# uname -a
Linux 6.8.0-1019-aws #21~22.04.1-Ubuntu SMP Thu Nov  7 17:33:30 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

$ kubectl version
Client Version: v1.31.3
Kustomize Version: v5.4.2
Server Version: v1.31.2

Regression

No response

Sysdump

No response

Relevant log output

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/cfpCilium Feature Proposalkind/community-reportThis was reported by a user in the Cilium community, eg via Slack.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions