Skip to content

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented May 26, 2023

This pull request fixes IPsec packet drops of type XfrmInNoStates that would happen on EKS and AKS for the whole duration of the upgrade (until the last node is upgraded). The root of the issue is that we switched from using CiliumInternalIP to using NodeInternalIP for the IPsec encapsulation and that change isn't properly handled on the upgrade.

To avoid the XfrmInNoStates drops, this pull request introduces a 2-steps upgrade process. First, after the usual upgrade, both CiliumInternalIPs and NodeInternalIPs encapsulations will be accepted on ingress, but NodeInternalIPs will still be used for the encapsulation on egress. Then, in a second time, users can switch to use-cilium-internal-ip-for-ipsec=true (defaults false) to start using CiliumInternalIPs.

⚠️ Note this pull request doesn't fix other drops of type XfrmOutPolBlock also happening during the IPsec + EKS/AKS upgrade. Those will be addressed in a separate PR.

How was this tested?

I performed two types of tests: first with a normal upgrade process and second with a split cluster:

  1. The usual upgrade path was tested on 2 clusters of 20 nodes each, with EKS and GKE. The upgrade was performed from Cilium v1.13.0 to this patchset, with the drop-sensitive migrate-svc application running during the upgrade. The two-steps upgrade was carried out and, after each step, I checked that no XfrmInNoStates drops were reported. The migrate-svc application did have drops during the first step because of the aforementioned XfrmOutPolBlock drops. It did not have any additional drops afterward. Our connectivity tests suite also passed after each step.
  2. I deployed a 3-nodes EKS cluster with a mix of Cilium versions to simulate the impact of a prolonged upgrade. Two nodes ran Cilium v1.13.0, while the third ran this patchset backported on the v1.13 branch. I ran the connectivity tests and confirmed that no drops, neither XfrmInNoStates nor XfrmOutPolBlock, were reported.

Fixes: #24773.
Fixes: #24030.

Fix a bug that would cause connectivity drops of type XfrmInNoStates on upgrade when IPsec is enabled with ENI or Azure IPAM mode.

@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. upgrade-impact This PR has potential upgrade or downgrade impact. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. needs-backport/1.11 labels May 26, 2023
@pchaigno pchaigno force-pushed the ipsec-eks-upgrade branch 2 times, most recently from 25d0a36 to f182153 Compare May 26, 2023 18:17
pchaigno and others added 7 commits May 27, 2023 21:13
The XFRM IN policies and states didn't change so we should never need
to remove any stale XFRM IN configs. Let's thus simplify the logic to
find stale policies and states accordingly.

I would expect this incorrect removal to cause a few drops on agent
restart, but after multiple attempts to reproduce on small (3 nodes)
and larger (20) clusters (EKS & GKE) with a drop-sensitive application
(migrate-svc), I'm not able to see such drops. I'm guessing this is
because we reinstall the XFRM IN configs right after we removed them so
there isn't really much time for a packet to be received and dropped.

Fixes: 688dc9a ("ipsec: Remove stale XFRM states and policies")
Signed-off-by: Paul Chaignon <paul@cilium.io>
This small bit of refactoring will make later changes a bit easier.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on
GitHub to not show space-only changes.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on
GitHub to not show space-only changes.

Same as the previous commit but on a different set of conditions.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This is a partial revert of commit 3e59b68 ("ipsec: Per-node XFRM
states & policies for EKS & AKS").

One change that commit 3e59b68 ("ipsec: Per-node XFRM states &
policies for EKS & AKS") make on EKS and AKS was to switch from using
NodeInternalIPs to using CiliumInternalIPs for outer IPsec (ESN) IP
addresses. That made the logic more consistent with the logic we use for
other IPAM schemes (e.g., GKE).

It however causes serious connectivity issues on upgrades and
downgrades. This is mostly because typically not all nodes are updated
to the new Cilium version at the same time. If we consider two pods on
nodes A and B trying to communicate, then node A may be using the old
NodeInternalIPs while node B is already on the new CiliumInternalIPs.
When node B sends traffic to node A, node A doesn't have the XFRM state
IN necessary to decrypt it. The same happens in the other direction.

This commit reintroduces the NodeInternalIPs for EKS and AKS. Subsequent
commits will introduce additional changes to enable a proper migration
path from NodeInternalIPs to CiliumInternalIPs.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, we currently use NodeInternalIPs for the IPsec tunnels.
A subsequent commit will allow us to change that to switch to using
CiliumInternalIPs (as done on GKE).

For that to be possible without breaking inter-node connectivity for the
whole duration of the switch, we need an intermediate mode where both
CiliumInternalIPs and NodeInternalIPs are accepted on ingress.

The idea is that we will then have a two-steps migration from
NodeInternalIP to CiliumInternalIP:

  1. All nodes are using NodeInternalIP.
  2. Upgrade to the version of Cilium that supports both NodeInternalIP
     and CiliumInternalIP and encapsulates IPsec traffic with
     NodeInternalIP.
  3. Via an agent flag, tell Cilium to switch to encapsulating IPsec
     traffic with CiliumInternalIP.
  4. All nodes are using CiliumInternalIP.

This commit implements the logic for step 2 above. To that end, we will
duplicate the XFRM IN states such that we have both:

    src 0.0.0.0 dst [NodeInternalIP]   # existing
    src 0.0.0.0 dst [CiliumInternalIP] # new

thus matching and being able to receive IPsec packets with an outer
destination IP of either NodeInternalIP or CiliumInternalIP.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, IPsec used NodeInternalIPs for the encapsulation. This
commit introduces a new flag to allow switching from NodeInternalIPs to
CiliumInternalIPs; it defaults to the former.

This new flag allows for step 3 of the migration plan defined in the
previous commit.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno pchaigno force-pushed the ipsec-eks-upgrade branch from f182153 to 9b91f40 Compare May 27, 2023 19:16
@pchaigno pchaigno requested a review from jschwinger233 May 27, 2023 19:21
@pchaigno pchaigno marked this pull request as ready for review May 27, 2023 19:21
@pchaigno pchaigno requested review from a team as code owners May 27, 2023 19:21
@pchaigno pchaigno requested a review from mhofstetter May 27, 2023 19:21
@pchaigno
Copy link
Member Author

/test

@pchaigno pchaigno changed the title ipsec: Fix ESK & AKS upgrade ipsec: Fix XfrmInNoStates drops on ESK & AKS upgrades May 28, 2023
spi, err := ipsec.UpsertIPsecEndpoint(wildcardCIDR, cidr, localIP, wildcardIP, 0, ipsec.IPSecDirIn, zeroMark)
upsertIPsecLog(err, "in CiliumInternalIPv6", wildcardCIDR, cidr, spi)

spi, err = ipsec.UpsertIPsecEndpoint(wildcardCIDR, cidr, localNodeInternalIP, wildcardIP, 0, ipsec.IPSecDirIn, zeroMark)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we can skip generating ingress policy for native node IP when UseCiliumInternalIPForIPsec is true, but It looks unsafe during upgrade, so be it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we need this for the upgrade. We can clean it up later, in v1.15, once we're sure everyone has upgraded from the old IPs.

Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config changes (agent & CLI) look good to me!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 30, 2023
@pchaigno pchaigno merged commit 963e45b into cilium:main May 30, 2023
@pchaigno pchaigno deleted the ipsec-eks-upgrade branch May 30, 2023 08:22
@pchaigno pchaigno added release-blocker/1.11 backport/author The backport will be carried out by the author of the PR. labels Jun 1, 2023
@pchaigno pchaigno added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.12 labels Jun 9, 2023
@qmonnet qmonnet added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport/author The backport will be carried out by the author of the PR. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

4 participants