Skip to content

v1.12 Backports 2023-06-12 #26117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 12, 2023
Merged

Conversation

gandro
Copy link
Member

@gandro gandro commented Jun 12, 2023

Once this PR is merged, you can update the PR labels via:

for pr in 25953 26072; do contrib/backporting/set-labels.py $pr done 1.12; done

or with

make add-labels BRANCH=v1.12 ISSUES=25953,26072

pchaigno and others added 4 commits June 12, 2023 12:30
[ upstream commit ac54f29 ]

We recently changed our XFRM configuration to have one XFRM OUT policy
per remote node, regardless of the IPAM mode being used. In doing so, we
also moved the XFRM FWD policy to be installed once per remote node.

With ENI and Azure IPAM modes, this wouldn't cause any issue because the
XFRM FWD policy is the same regardless of the remote node. On other IPAM
modes, however, the XFRM FWD policy is for some reason different
depending on the remote node that triggered the installation. As a
result, for those IPAM modes, one FWD policy is installed per remote
node. And the deletion logic triggered on node deletions wasn't updated
to take that into account. We thus have a leak of XFRM FWD policies.

In the end, our FWD policy just needs to allow everything through
without encrypting it. It doesn't need to be specific to any remote
node. We can simply completely wildcard the match, to look like:

    src 0.0.0.0/0 dst 0.0.0.0/0
        dir fwd priority 2975 ptype main
        tmpl src 0.0.0.0 dst 192.168.134.181
            proto esp reqid 1 mode tunnel
            level use

So we match all packets regardless of source and destination IPs. We
don't match on the packet mark.

There's a small implementation hurdle here. Because we used to install
FWD policies of the form "src 0.0.0.0/0 dst 10.0.1.0/24", the kernel was
able to deduce which IP family we are matching against and would adapt
the 0.0.0.0/0 source CIDR to ::/0 as needed. Now that we are matching on
0/0 for both CIDRs, it cannot deduce this anymore. So instead, we must
detect the IP family ourself and use the proper CIDRs.

In addition to changing the XFRM FWD policy to the above, we can also
stop installing it once per remote node. It's enough to install it when
we receive the event for the local node, once.

Fixes: 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit d0ab559 ]

We use this wildcard IPv6 CIDR in the catch-all default-drop OUT policy
as well as in the FWD policy. It was incorrectly set to ::/128 instead
of ::/0 and would therefore not match anything instead of matching
everything. This commit fixes it.

Fixes: e802c29 ("ipsec: Refactor wildcard IP variables")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 25064d1 ]

After applying a backport of 9cc8a89 ("ipsec: Fix leak of XFRM
policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were
getting occasional spikes of "no inbound state" xfrm errors
(XfrmInNoStates). These lead to packet loss and brief outages for
applications sending traffic to the node on which the spikes occur.

I noticed that the "No node ID found for node." logline would appear at
the time of these spikes and from the code this is logged when the node
ID cannot be resolved. Looking a bit further the call to
`DeleteIPsecEndpoint` will end up deleting the xfrm state for any state
that matches the node id as derived from the mark in the state.

The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP
has a mark of `0xd00` which when shifted >> 16 in
`getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets
deleted and the kernel drops all the inbound traffic as it no longer
matches a state.

This commit updates that logic to skip the XFRM state and policy
deletion when the node ID is zero.

Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs")
Signed-off-by: Steven Johnson <sjdot@protonmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 57eac9d ]

With commit 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and
Azure IPAMs") we rely on the node ID to find XFRM states and policies
that belong to remote nodes, to clean them up when remote nodes are
deleted.

This commit makes sure that we only do this for XFRM states and policies
that actually match on these node IDs. That should only be the same if
the mark mask matches on node ID bits. Thus if should look like
0xffffff00 (matches on node ID, SPI, and encryption bit) or 0xffff0f00
(matches on node and encryption bit).

Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro requested a review from a team as a code owner June 12, 2023 12:30
@gandro gandro added kind/backports This PR provides functionality previously merged into master. backport/1.12 labels Jun 12, 2023
@gandro gandro requested a review from pchaigno June 12, 2023 12:30
@gandro
Copy link
Member Author

gandro commented Jun 12, 2023

/test-backport-1.12

@qmonnet
Copy link
Member

qmonnet commented Jun 12, 2023

Cilium L4LB XDP failed with an error that I couldn't associate to any known flake, but I don't believe the failing test involved IPsec in any way so I think it's unrelated. I'll file an issue and re-trigger the test.

[EDIT: Looks like it passed this time]

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PRs look good. Thanks Sebastian!

@pchaigno pchaigno added the area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. label Jun 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 12, 2023
@gandro gandro merged commit b6a04c6 into cilium:v1.12 Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants