-
Notifications
You must be signed in to change notification settings - Fork 3.4k
ipsec: Fix XfrmInNoStates
drops on ESK & AKS upgrades
#25724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
25d0a36
to
f182153
Compare
The XFRM IN policies and states didn't change so we should never need to remove any stale XFRM IN configs. Let's thus simplify the logic to find stale policies and states accordingly. I would expect this incorrect removal to cause a few drops on agent restart, but after multiple attempts to reproduce on small (3 nodes) and larger (20) clusters (EKS & GKE) with a drop-sensitive application (migrate-svc), I'm not able to see such drops. I'm guessing this is because we reinstall the XFRM IN configs right after we removed them so there isn't really much time for a packet to be received and dropped. Fixes: 688dc9a ("ipsec: Remove stale XFRM states and policies") Signed-off-by: Paul Chaignon <paul@cilium.io>
This small bit of refactoring will make later changes a bit easier. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Same as the previous commit but on a different set of conditions. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This is a partial revert of commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS"). One change that commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") make on EKS and AKS was to switch from using NodeInternalIPs to using CiliumInternalIPs for outer IPsec (ESN) IP addresses. That made the logic more consistent with the logic we use for other IPAM schemes (e.g., GKE). It however causes serious connectivity issues on upgrades and downgrades. This is mostly because typically not all nodes are updated to the new Cilium version at the same time. If we consider two pods on nodes A and B trying to communicate, then node A may be using the old NodeInternalIPs while node B is already on the new CiliumInternalIPs. When node B sends traffic to node A, node A doesn't have the XFRM state IN necessary to decrypt it. The same happens in the other direction. This commit reintroduces the NodeInternalIPs for EKS and AKS. Subsequent commits will introduce additional changes to enable a proper migration path from NodeInternalIPs to CiliumInternalIPs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, we currently use NodeInternalIPs for the IPsec tunnels. A subsequent commit will allow us to change that to switch to using CiliumInternalIPs (as done on GKE). For that to be possible without breaking inter-node connectivity for the whole duration of the switch, we need an intermediate mode where both CiliumInternalIPs and NodeInternalIPs are accepted on ingress. The idea is that we will then have a two-steps migration from NodeInternalIP to CiliumInternalIP: 1. All nodes are using NodeInternalIP. 2. Upgrade to the version of Cilium that supports both NodeInternalIP and CiliumInternalIP and encapsulates IPsec traffic with NodeInternalIP. 3. Via an agent flag, tell Cilium to switch to encapsulating IPsec traffic with CiliumInternalIP. 4. All nodes are using CiliumInternalIP. This commit implements the logic for step 2 above. To that end, we will duplicate the XFRM IN states such that we have both: src 0.0.0.0 dst [NodeInternalIP] # existing src 0.0.0.0 dst [CiliumInternalIP] # new thus matching and being able to receive IPsec packets with an outer destination IP of either NodeInternalIP or CiliumInternalIP. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, IPsec used NodeInternalIPs for the encapsulation. This commit introduces a new flag to allow switching from NodeInternalIPs to CiliumInternalIPs; it defaults to the former. This new flag allows for step 3 of the migration plan defined in the previous commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
f182153
to
9b91f40
Compare
/test |
XfrmInNoStates
drops on ESK & AKS upgrades
spi, err := ipsec.UpsertIPsecEndpoint(wildcardCIDR, cidr, localIP, wildcardIP, 0, ipsec.IPSecDirIn, zeroMark) | ||
upsertIPsecLog(err, "in CiliumInternalIPv6", wildcardCIDR, cidr, spi) | ||
|
||
spi, err = ipsec.UpsertIPsecEndpoint(wildcardCIDR, cidr, localNodeInternalIP, wildcardIP, 0, ipsec.IPSecDirIn, zeroMark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we can skip generating ingress policy for native node IP when UseCiliumInternalIPForIPsec
is true, but It looks unsafe during upgrade, so be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we need this for the upgrade. We can clean it up later, in v1.15, once we're sure everyone has upgraded from the old IPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config changes (agent & CLI) look good to me!
This pull request fixes IPsec packet drops of type
XfrmInNoStates
that would happen on EKS and AKS for the whole duration of the upgrade (until the last node is upgraded). The root of the issue is that we switched from using CiliumInternalIP to using NodeInternalIP for the IPsec encapsulation and that change isn't properly handled on the upgrade.To avoid the
XfrmInNoStates
drops, this pull request introduces a 2-steps upgrade process. First, after the usual upgrade, both CiliumInternalIPs and NodeInternalIPs encapsulations will be accepted on ingress, but NodeInternalIPs will still be used for the encapsulation on egress. Then, in a second time, users can switch touse-cilium-internal-ip-for-ipsec=true
(defaults false) to start using CiliumInternalIPs.XfrmOutPolBlock
also happening during the IPsec + EKS/AKS upgrade. Those will be addressed in a separate PR.How was this tested?
I performed two types of tests: first with a normal upgrade process and second with a split cluster:
migrate-svc
application running during the upgrade. The two-steps upgrade was carried out and, after each step, I checked that noXfrmInNoStates
drops were reported. Themigrate-svc
application did have drops during the first step because of the aforementionedXfrmOutPolBlock
drops. It did not have any additional drops afterward. Our connectivity tests suite also passed after each step.XfrmInNoStates
norXfrmOutPolBlock
, were reported.Fixes: #24773.
Fixes: #24030.