Skip to content

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Apr 16, 2024

@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Apr 16, 2024
@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 17, 2024
@github-actions github-actions bot closed this May 31, 2024
@jschwinger233 jschwinger233 reopened this Jun 6, 2024
[ upstream commit: f93a40c ]

We have an iptables rule to set 0x200 mark for transparent socket:

```
*mangle
-A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_mangle" -j CILIUM_PRE_mangle
-A CILIUM_PRE_mangle -m socket --transparent -m mark ! --mark 0xe00/0xf00 -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff
```

This rule is in the mangle PREROUTING which checks packets ingressed
from a netdev.

Let's then focus on the pod to world traffic when IPsec=on + proxy=on +
tunnel=off.

Currently, a pod-to-world packet will go through the path:
1. from_lxc@lxc: skb->mark is set to 0x200 and returned to stack
2. iptables: skb is hijacked by tproxy (due to 0x200), to be accepted by proxy
3. proxy process: the old skb is consumed by proxy, an new skb is sent to upstream (world)
4. stack routing: the new skb is routed to eth0
5. stack iptables: the new skb is traversing OUTPUT chain and POSTROUTING chain
6. to_netdev@eth0: the new skb is going to world

Please note the new skb won't hit PREROUTING chain, where there is a
rule setting skb->mark=0x200.

To fix #31984, we are going to
change the routing for packets from egress proxy; consequently, on the
step 4 above, the new skb will be routed to cilium_host instead:

4. stack routing: the new skb is routed to cilium_host
5. from_host@cilium_host: the new skb is returned to stack
6. to_host@cilium_net: the new skb is returned to stack
7. stack: PREROUTING, routing, FORWARD, POSTROUTING

Look at step 7, we are hitting PREROUTING! Because of
cilium/proxy#742, this to-world skb is also
linked to a transparent socket, matching the "-m socket --transparent"
condition, the packet will fortunately have the 0x200 mark.

If we do nothing, this to-world skb marked with 0x200 will then hit
routiong rule "from all fwmark 0x200/0xf00 lookup 2004" and be routed to
local. It should have gone to the world.

This patch fixes this future issue as a precaution (otherwise we'll
break git-bisect).

This patch provides a straightforward solution: at step 5
from_host@cilium_host, we set a specical mark 0x800
(MARK_MAGIC_PROXY_TO_WORLD), then iptables can exclude this mark using
"-m mark ! --mark 0x800/0xf00".

Signed-off-by: gray <gray.liang@isovalent.com>
[ upstream commit: 3384d73 ]

After cilium/proxy#742, proxy traffic keeps
original pod IP as source IP for to-world packets, which must be
masqueraded to eth0 IP. There is no issue for now, but the new
routing rule (0xb00 lookup 2005) to be added for #31984
will cause a side effect breaking masquerading. This patch fixes the
that side effect as a precaution, otherwise git-bisect breaks.

The new routing rule (0xb00 lookup 2005) will cause proxy packets going
through POSTROUTING for twice: first time happens when proxy sends
packets which are routed to cilium_host, these are hitting OUTPUT +
**POSTROUTING**; the second time takes place after packets ingressed
from cilium_net, these skbs will traverse PREROUTING + FORWARD +
**POSTROUTING**.

However, due to kernel's implementation details, an skb won't be
processed by nat POSTROUTING for twice: after the first POSTROUTING
check, skb's ct `(struct nf_conn*)(skb->_nfct & ~7)` has a status
IPS_SRC_NAT_DONE to skip the further traversal at all. [1]

To avoid being set the IPS_SRC_NAT_DONE flag, this patch adds an
iptables rule `--mark 0xb00 -j CT --notrack` at OUTPUT to skip the first
round iptables ct, just for proxy traffic which is characterized by
0xb00 mark.

[1] https://elixir.bootlin.com/linux/v6.6.2/source/net/netfilter/nf_nat_core.c#L825
[1] https://elixir.bootlin.com/linux/v6.6.2/source/include/net/netfilter/nf_nat.h#L111

Signed-off-by: gray <gray.liang@isovalent.com>
[ upstream commit: 1ce4c7f ]

[ backporter's note: v1.14 still uses bpf/init.sh to install routing
rules, we have to re-implement the logic in bash. ]

This commit installs "0xb00/0xf00 lookup 2005" routing rule when IPsec
is enabled with native routing and envoy. This is a necessary step
towards fixing encryption leaks, otherwise egress proxy's return traffic
gets no chance to be set IPsec mark. The new routing rule ensures these
packets are routed to cilium_host, where we have bpf_host to handle
encryption datapath.

This patch uses a different condition from requireFromProxyRoutes() to
determine whether to install the new routing rule, otherwise we
will see breakage on IPsec=off + envoy=on. Specially, the new routing
rule is isolated to IPsec only.

Signed-off-by: gray <gray.liang@isovalent.com>
@jschwinger233 jschwinger233 force-pushed the pr/gray/1.14/egress-proxy-ipsec-fix2 branch from 18262e0 to 7b9f5e8 Compare June 6, 2024 11:58
@cilium cilium deleted a comment from github-actions bot Jun 6, 2024
@cilium cilium deleted a comment from github-actions bot Jun 6, 2024
@jschwinger233
Copy link
Member Author

/test-backport-1.14

@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 7, 2024
@jschwinger233
Copy link
Member Author

CI is almost green except ci-ipsec-e2e https://github.com/cilium/cilium/actions/runs/9400458006

image

ci-ipsec-e2e's failure is a known racing issue of leak detection that will be fixed by #32930.

I'll drop the last three leak detection patches and mark ready for review.

@jschwinger233 jschwinger233 force-pushed the pr/gray/1.14/egress-proxy-ipsec-fix2 branch from 7b9f5e8 to 13c8c2f Compare June 7, 2024 06:21
@jschwinger233 jschwinger233 changed the title Pr/gray/1.14/egress proxy ipsec fix2 [1.15-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy Jun 7, 2024
@jschwinger233
Copy link
Member Author

/test-backport-1.14

@jschwinger233 jschwinger233 changed the title [1.15-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy [1.14-backport] ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy Jun 7, 2024
@jschwinger233 jschwinger233 marked this pull request as ready for review June 7, 2024 07:46
@jschwinger233 jschwinger233 requested a review from a team as a code owner June 7, 2024 07:46
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks, bash script looks OK

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 7, 2024
@qmonnet qmonnet merged commit cfdfcf8 into v1.14 Jun 7, 2024
@qmonnet qmonnet deleted the pr/gray/1.14/egress-proxy-ipsec-fix2 branch June 7, 2024 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

2 participants