Skip to content

Conversation

viktor-kurchenko
Copy link
Contributor

@viktor-kurchenko viktor-kurchenko commented Jun 23, 2025

pchaigno and others added 2 commits June 23, 2025 16:22
[ upstream commit f42e7d8 ]

I'm unsure what "Cilium-managed host traffic" really means, but we
should not give the impression anything other than Cilium-managed
pod-to-pod traffic is encrypted. We haven't encrypted traffic between
pods and hostns for a long time now.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
[ upstream commit 8103535 ]

IPSec relies on XFRM states and policies to ensure encrypted pod-to-pod
connectivity. In the XFRM state:

* seq is the incoming sequence number counter. It is used by the kernel
  to track and validate received packets. When a packet arrives, its
  sequence number is compared to the expected range to detect replays
  or out-of-order packets. Together this the oseq value, it helps implement
  the anti-replay window to prevent attackers from resending previously captured packets.
* oseq is the outgoing sequence number counter. It is used when sending
  packets protected by IPsec. Each outbound IPsec packet is assigned an
  incrementing oseq value. The oseq ensures unique sequence numbers for
  each packet, which the receiver uses to validate the order and detect replays.

In a SA between two nodes A and B, the seq/oseq values in the XFRM state
A must match the oseq/seq values in node B, and vice versa. If that is not
the case, users would experience the `XfrmInStateProtoError` error, with
no IPSec connectivity between the two nodes.

We noticed that a Cilium user might end up in this situation in both the
following cases, as stated in the doc changes:

1. KVStore Mode (e.g., etcd): if a Cilium agent connects too late to the
   newly created KVStore, it may miss the node delete and create events
   for entries that were restored or reinitialized. This results in staling
   XFRM state, causing permanent network disruption.
2. KVStore Mode: if a Cilium agent is down for prolonged time, the
   corresponding node entry in the kvstore will be deleted due to lease
   expiration (15m), resulting in stale XFRM states.
3. CRD Mode: a similar issue may occur when a CiliumNode resource is deleted
   and the Cilium agent DaemonSet is restarted. While other agents will
   recreate fresh XFRM states for the new CiliumNode, the restarted agent
   may continue to hold obsolete XFRM states referencing all peer nodes.

The identified mitigation strategy for these scenario is an IPSec
key rotation, which would cause all the states to be consistently
recreated in all Cilium agents.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
@viktor-kurchenko viktor-kurchenko added kind/backports This PR provides functionality previously merged into master. backport/1.17 This PR represents a backport for Cilium 1.17.x of a PR that was merged to main. labels Jun 23, 2025
@viktor-kurchenko
Copy link
Contributor Author

/test

@viktor-kurchenko viktor-kurchenko marked this pull request as ready for review June 23, 2025 14:26
@viktor-kurchenko viktor-kurchenko requested a review from a team as a code owner June 23, 2025 14:26
Copy link
Contributor

@smagnani96 smagnani96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Viktor, LGTM!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 23, 2025
@joestringer joestringer added this pull request to the merge queue Jun 24, 2025
Merged via the queue into v1.17 with commit 5e5af9b Jun 24, 2025
61 checks passed
@joestringer joestringer deleted the pr/v1.17-backport-2025-06-23-04-22 branch June 24, 2025 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.17 This PR represents a backport for Cilium 1.17.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants