v1.12 Backports 2023-06-12 #26117

gandro · 2023-06-12T12:30:53Z

Once this PR is merged, you can update the PR labels via:

for pr in 25953 26072; do contrib/backporting/set-labels.py $pr done 1.12; done

or with

make add-labels BRANCH=v1.12 ISSUES=25953,26072

[ upstream commit ac54f29 ] We recently changed our XFRM configuration to have one XFRM OUT policy per remote node, regardless of the IPAM mode being used. In doing so, we also moved the XFRM FWD policy to be installed once per remote node. With ENI and Azure IPAM modes, this wouldn't cause any issue because the XFRM FWD policy is the same regardless of the remote node. On other IPAM modes, however, the XFRM FWD policy is for some reason different depending on the remote node that triggered the installation. As a result, for those IPAM modes, one FWD policy is installed per remote node. And the deletion logic triggered on node deletions wasn't updated to take that into account. We thus have a leak of XFRM FWD policies. In the end, our FWD policy just needs to allow everything through without encrypting it. It doesn't need to be specific to any remote node. We can simply completely wildcard the match, to look like: src 0.0.0.0/0 dst 0.0.0.0/0 dir fwd priority 2975 ptype main tmpl src 0.0.0.0 dst 192.168.134.181 proto esp reqid 1 mode tunnel level use So we match all packets regardless of source and destination IPs. We don't match on the packet mark. There's a small implementation hurdle here. Because we used to install FWD policies of the form "src 0.0.0.0/0 dst 10.0.1.0/24", the kernel was able to deduce which IP family we are matching against and would adapt the 0.0.0.0/0 source CIDR to ::/0 as needed. Now that we are matching on 0/0 for both CIDRs, it cannot deduce this anymore. So instead, we must detect the IP family ourself and use the proper CIDRs. In addition to changing the XFRM FWD policy to the above, we can also stop installing it once per remote node. It's enough to install it when we receive the event for the local node, once. Fixes: 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit d0ab559 ] We use this wildcard IPv6 CIDR in the catch-all default-drop OUT policy as well as in the FWD policy. It was incorrectly set to ::/128 instead of ::/0 and would therefore not match anything instead of matching everything. This commit fixes it. Fixes: e802c29 ("ipsec: Refactor wildcard IP variables") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 25064d1 ] After applying a backport of 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were getting occasional spikes of "no inbound state" xfrm errors (XfrmInNoStates). These lead to packet loss and brief outages for applications sending traffic to the node on which the spikes occur. I noticed that the "No node ID found for node." logline would appear at the time of these spikes and from the code this is logged when the node ID cannot be resolved. Looking a bit further the call to `DeleteIPsecEndpoint` will end up deleting the xfrm state for any state that matches the node id as derived from the mark in the state. The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP has a mark of `0xd00` which when shifted >> 16 in `getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets deleted and the kernel drops all the inbound traffic as it no longer matches a state. This commit updates that logic to skip the XFRM state and policy deletion when the node ID is zero. Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Steven Johnson <sjdot@protonmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

[ upstream commit 57eac9d ] With commit 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") we rely on the node ID to find XFRM states and policies that belong to remote nodes, to clean them up when remote nodes are deleted. This commit makes sure that we only do this for XFRM states and policies that actually match on these node IDs. That should only be the same if the mark mask matches on node ID bits. Thus if should look like 0xffffff00 (matches on node ID, SPI, and encryption bit) or 0xffff0f00 (matches on node and encryption bit). Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2023-06-12T13:22:34Z

/test-backport-1.12

qmonnet · 2023-06-12T14:48:24Z

Cilium L4LB XDP failed with an error that I couldn't associate to any known flake, but I don't believe the failing test involved IPsec in any way so I think it's unrelated. I'll file an issue and re-trigger the test.

[EDIT: Looks like it passed this time]

pchaigno

My PRs look good. Thanks Sebastian!

pchaigno and others added 4 commits June 12, 2023 12:30

gandro requested a review from a team as a code owner June 12, 2023 12:30

gandro added kind/backports This PR provides functionality previously merged into master. backport/1.12 labels Jun 12, 2023

gandro requested a review from pchaigno June 12, 2023 12:30

gandro added the release-blocker/1.12 label Jun 12, 2023

qmonnet approved these changes Jun 12, 2023

View reviewed changes

qmonnet mentioned this pull request Jun 12, 2023

[1.12] CI: Cilium L4LB XDP: Failed to connect to 10.0.0.8 port 80 #26129

Closed

pchaigno approved these changes Jun 12, 2023

View reviewed changes

pchaigno added the area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. label Jun 12, 2023

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 12, 2023

gandro merged commit b6a04c6 into cilium:v1.12 Jun 12, 2023

This was referenced Jun 12, 2023

Prepare for release v1.12.11 #26140

Closed

Prepare for release v1.12.11 #26203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.12 Backports 2023-06-12 #26117

v1.12 Backports 2023-06-12 #26117

Uh oh!

gandro commented Jun 12, 2023 •

edited by pchaigno

Loading

Uh oh!

gandro commented Jun 12, 2023

Uh oh!

qmonnet commented Jun 12, 2023 •

edited

Loading

Uh oh!

pchaigno left a comment

Uh oh!

Uh oh!

v1.12 Backports 2023-06-12 #26117

v1.12 Backports 2023-06-12 #26117

Uh oh!

Conversation

gandro commented Jun 12, 2023 • edited by pchaigno Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gandro commented Jun 12, 2023

Uh oh!

qmonnet commented Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pchaigno left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gandro commented Jun 12, 2023 •

edited by pchaigno

Loading

qmonnet commented Jun 12, 2023 •

edited

Loading