-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf: wireguard: move to cil_from_wireguard program #37084 #38077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ebf75a
to
7b1f765
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick first pass:
37a1467
to
4779667
Compare
It seems that the downgraded path can simply be handled with a commit directly in |
4779667
to
e706e74
Compare
e706e74
to
ce1894d
Compare
e2e-upgrade complaining after downgrading to v1.17 (pod conn disrupt) https://github.com/cilium/cilium/actions/runs/13923430629/job/39113604510. From sysdumps, programs seems to be correctly attached ( Going to run again and check if this is consistent. |
ce1894d
to
dd70a6e
Compare
/ci-e2e-upgrade |
4a46ef6
to
43031bd
Compare
0be5d0f
to
16589b2
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! Thanks for the detailed commit messages.
will auto-merge once all comments are resolved 🚀 |
16589b2
to
06c292c
Compare
The aim of the last commit was to get rid of the ad-hoc Re-based to pick up fix for CI |
From network-facing devices we use this config variable to decide whether to lookup the secctx from ipcache. Given that we will transition to a dedicated from-wireguard program in the subsequent commits, it will share this codepath with bpf_host. So rather than duplicating the code, simply move the secctx variable into global config: its values is decided in the agent depending on the legacy routing value, but it is always set to false for bpf_overlay, bpf_lxc, bpf_network, and bpf_host. The decision to set this as a global config and not a node config is that the latter would conceptually set while calling `nodeConfig()` before each invocation to `NewBPFHost() / NewBPFNetwork` etc. However, given that each program needs to set a different value for this variable, the global config is more (logically) suited for these kind of changes. In BPF tests, we used to assign 1 to this variable to have it != 2, where 2 was the previous value to ensure identity lookups from IPCache are enabled. Given that host_secctx_from_ipcache is now always false as default, we can now cleanup all those config assignments, which might be confusing and they are not needed anymore. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
In this commit we change the name of the cilium_calls map for tail calls in bpf_wireguard. Prior to this, the map was named `cilium_calls_wireguard_2`, where '2' stands for identity.ReservedIdentityWorld. Given we do not expect multiple WireGuard interfaces in Cilium in the same node, it is ok to use such fixed name. However, renaming the calls map has two main benefits: 1. using ifindex rather than identity.ReservedIdentityWorld aligns the behavior to what we do for netdev. We could also consider this transition for the map in bpf_overlay, but this won't be part of this patch series. 2. when transitioning to a dedicated from-wireguard program (subsequent commits), we preserve connectivity and avoid MissingTailCall errors while downgrading to previous versions where we used to attach from-netdev to the ingress hook of cilium_wg0. We observed that, during downgrades for instance from v1.18 next commit (from-wireguard) to v1.17 (from-netdev), the `replaceWireguardDatapath()` function clears the cilium_calls map for the WireGuard programs as soon as we invoke the `commit()` function. This means that while we wait for the from-netdev (v1.17) to be attached, the from-wireguard (v1.18) is running with an empty tail call map, since it is being overwritten. We cannot first detach from-wireguard, as north-south connections that require rev-xlation would be disrupted (pod-to-pod would work through the stack rather than us calling bpf_redirect). Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
We currently need to attach `cil_from_netdev` to the ingress hook of the Wireguard `cilium_wg0` device to enable encryption of KPR traffic in native routing. In that case, a reply packet would be otherwise dropped due to lacking of rev-SNAT/DNAT (for further info, please refer to the Issue/PR #19401). However, we don't need the whole logic, which might be problematic as described in #19401 (comment): in case of host firewall enabled, policy enforcement would not only (erroneously) happen, but they would also be applied ONLY to the ingress hook of the wireguard device. Let's switch to a dedicated program with the minimal set of needed features. The new `cil_from_wireguard` must look as much as possible to the previous `cil_from_netdev`, while additionally dropping not needed features. For this reason, here's the list of codepaths that have been either dropped or preserved: * Codepaths dropped: * ENABLE_NODEPORT_ACCELERATION * ctx->vlan_present * ENABLE_HOST_FIREWALL * ENABLE_IPSEC * ENABLE_ARP_PASSTHROUGH || ENABLE_ARP_RESPONDER || ENABLE_L2_ANNOUNCEMENTS * Codepaths preserved: * ENABLE_IPV4_FRAGMENTS * ENABLE_NODEPORT * ENABLE_HOST_ROUTING This means that the new wireguard program, as well as the previously attached from_netdev, is able to locally deliver packets to endpoints. However, differently than `cil_from_overlay`, if an endpoint is not identified, the packet is left for the stack. This patch supports Cilium version upgrades with the removal of the previously used cil_from_netdev. Having both the old from-netdev and the new from-wireguard programs attached causes connectivity problems, therefore here we make sure to always detach the old program. For downgrades, we previously landed in v1.17 (i) the removal logic for the newly introduced calls_map, and (ii) the removal logic for the new from-wireguard program #38187: this means that, upon downgrading Cilium, we are able to: * TCX: (1) re-attach from-netdev then (2) remove from-wireguard * TC: replace existing filter with the previous from-netdev Upgrades/Downgrades are correctly handles also due to changin the cilium tailcall map used in the wireguard code path in v1.18: if we preserved the same name cilium_calls_wireguard_2 than v1.17, when downgrading we could experiment MissingTailCall erorr, given that the new from-wireguard program would be still in place while the calls map is completely overwritten from replaceWireguard. The from-wireguard program is later detached upon re-inserting the old from-netdev, but for that time window it would not be able to handle packets, leaving them for the stack (in most of the cases they'd be still reaching their endpoints). An additional thing to note, is that with the cil_from_wireguard program inserted we preserve the same packets' path as before when we used to attach from_netdev rather than from_wg: * Request pod-to-pod: to_pod->to_ndetdev->to_wg->to_ndetdev * Reply pod-to-pod: from_netdev->from_wg->from_pod Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This patch simplifies the logic where we previously needed to check whether the program was bpf_wireguard or bpf_host used as ingress for the WireGuard ingress hook. With the previous commit we moved to a complete standalone from-wireguard program, therefore we can cleanup codepaths in which we previously used to check THIS_INTERFACE_IFINDEX == WG_IFINDEX. In particular, the observation points TRACE_{FROM,TO}_CRYPTO can now be used with just IS_BPF_WIREGUARD. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commit refactores tc_nodeport_l3_dev.c tests to be able to reuse as much code as possible between pure `tc_nodeport_l3_dev` (generic L3 device, not specifically set to wireguard) and `tc_nodeport_l3_wireguard` (specific to wireguard, where different BPF metrics must be ensured). Let's keep them both in case of L3 devices not necessarily tighted to crypto functions (ex. wireguard). Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commit completes tests in tc_nodeport_l3_dev.h adding the cases for egress hooks, both used for IS_BPF_WIREGUARD and IS_BPF_HOST. While adding these checks, we also ensures that metrics are correctly updated while processing packets from the Wireguard device with respect to any other L3 device. This patch allows us to remove the ad-hoc egress tests previously introduced in the `wireguard_metrics.c` file, now covered from `tc_nodeport_l3_wireguard.c`. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
06c292c
to
fc7ce5d
Compare
/test |
[ upstream commit 91ef65d ] This check was added in #38110, when we still used from-netdev for the wireguard interface. But as #38077 was developed in parallel, it looks like the change was accidentally dropped when introducing the dedicated from-wireguard program. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
[ upstream commit 91ef65d ] This check was added in #38110, when we still used from-netdev for the wireguard interface. But as #38077 was developed in parallel, it looks like the change was accidentally dropped when introducing the dedicated from-wireguard program. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
[ upstream commit 91ef65d ] This check was added in #38110, when we still used from-netdev for the wireguard interface. But as #38077 was developed in parallel, it looks like the change was accidentally dropped when introducing the dedicated from-wireguard program. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
[ upstream commit 91ef65d ] This check was added in #38110, when we still used from-netdev for the wireguard interface. But as #38077 was developed in parallel, it looks like the change was accidentally dropped when introducing the dedicated from-wireguard program. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
[ upstream commit 91ef65d ] This check was added in #38110, when we still used from-netdev for the wireguard interface. But as #38077 was developed in parallel, it looks like the change was accidentally dropped when introducing the dedicated from-wireguard program. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
Depends on:
cil_from_wireguard
program we're introducing here, to handle Cilium downgrades.cilium_calls_wireguard_<ifindex>
Feature will not be backported.
Notes 1/1 w/ Robin:
cil_from_wireguard
is attached to the tail of programs queue. Unless the previous program returnsTCX_NEXT
(and we NEVER use this return code),cil_from_wireguard
will never see the packet. This means that onlycil_from_netdev
(head of the queue) handles packet processing. Removingcil_from_netdev
will atomically switch the packet processing to the subsequent program,cil_from_wireguard
.cil_from_wireguard
is attached the old-fashioned way, which already atomically attaches the new filter replacing the previous one (cil_from_netdev
in our case).cil_from_wireguard
would be the head-of-the-tcx-queue program handling traffic. We attach the previouscil_from_netdev
to the tail. As soon as we removecil_from_wireguard
, thecil_from_netdev
will start handling packet processing atomically.cil_from_netdev
would be attached the old-fashioned way, instantly replacing thecil_from_wireguard
program atomically.Notes 1/1 w/ Timo:
MissingTailCall
: when downgrading, the v1.17 bpf_wireguard is compiled, and upon executing the loader commandcommit()
, the call map is being overwritten. However, the from_wireguard program introduced in this PR is still running, and it looses the references to the tail call. For this reason, we opt for renaming the calls map for wireguard in v1.18 (this PR), while also landing the removal logic support in v1.17 (linked PR above (4)).Fixes: #33676