Skip to content

Unencrypted traffic among nodes when IPsec is used with L7 egress proxy #31984

@jschwinger233

Description

@jschwinger233

Security advisory: GHSA-j89h-qrvr-xc36

Previous PRs (#29530, #29594, #30095) addressed the vulnerabilities for IPsec and L7 ingress proxy, this issue is opened for IPsec and L7 egress proxy.

Initial Attempts

In the first place we tried to simply add an ip rule for from-egress-proxy traffic, just like what we did at bca5f07, which added an ip rule for from-ingress-proxy traffic. This commit (1c97de7) implemented the idea,

However, it has turned out to be not enough. We realized iptables also needs to be taken care of, so the second commit (1b5e7d6) followed. Please see the commit message for reasons.

After above two patches, ci-ipsec-e2e seemed to be happy, even with leak detection, but ci-ipsec-upgrade turned broken badly. We found the failure happened during downgrade from, say, 1.15-patched to 1.14-tip (minor downgrade). 1.15-patches will install the new from-egress-proxy rule + new iptables rule; after downgrade to 1.14-tip, the new iptables rule is cleaned up, but the new from-egress-proxy rule is left over, because the old 1.14-tip doesn't recognise from-egress-rule at all. The stale ip rule messed up.

To handle rules carefully without breaking downgrade, I propose the two-stage patches as followed.

Proposed solution

Stage one

We will have to merge the following PRs:

These PRs are doing a simple thing: remove from-egress-proxy rule, even they doesn't install the rule. The rule is probably installed by higher version of cilium.

Stage two

After PRs in stage one are all merged into stable branches and released in weeks, we can proceed with the following PRs:

Right now they consists of two commits described in "Initial Attempts" section, plus "manually specify downgrade version", plus leak detection CI checks. I will clean up the PRs once ready, for now the leak detection patches and "manually specify" patch are temporarily added to demonstrate the solution works. We have noted that ci-ipsec-upgrade and ci-ipsec-e2e are mostly in green status.

Outstanding issues

Leak detection check sometimes failes at the very end of ci-ipsec-e2e. The latest speculation is it's caused by conn-disrupt pods deletion.

If conn-disrupt-server is deleted with its ipcache entry, but an skb holding conn-disrupt-client -> conn-disrupt-server already gets emitted, then this skb at from_host@cilium_host won't be marked as need-ipsec-encryption (0xe00) due to ipcache entry missing. bpftrace wil then catch it.

Right now there is no obvious solution (if the theory turns out to be correct). We will move forward and discuss it later.

Metadata

Metadata

Assignees

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/encryptionImpacts encryption support such as IPSec, WireGuard, or kTLS.area/proxyImpacts proxy components, including DNS, Kafka, Envoy and/or XDS servers.feature/ipsecRelates to Cilium's IPsec featurekind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions