Skip to content

bpf: tunnel-related cleanups in to-container path #28920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

julianwiedmann
Copy link
Member

@julianwiedmann julianwiedmann commented Nov 1, 2023

We've accumulated a bunch of special handling and kube-proxy workaround related to tunnel traffic. With the recent introduction of a from_tunnel parameter for bpx_lxc's ingress tail-call, it's now easier to fine-tune these code paths and reduce their scope.

@julianwiedmann julianwiedmann added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. area/kube-proxy Issues related to kube-proxy (not the kube-proxy-free mode). labels Nov 1, 2023
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann changed the title 1.15 bpf local delivery overlay bpf: clean up workarounds for kube-proxy with tunnel traffic Nov 1, 2023
@julianwiedmann julianwiedmann force-pushed the 1.15-bpf-local-delivery-overlay branch from 9cb148c to 799a3af Compare November 2, 2023 04:54
@julianwiedmann
Copy link
Member Author

/test

…l-call

We recently added a `from_tunnel` parameter to the local delivery path
of IPv4 traffic, which gets passed to the pod's ingress tail-call via
CB_FROM_TUNNEL.

Add the same parameter for the IPv6 path, and then use it in bpf_lxc to
fine-tune the kube-proxy workaround.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
To stay consistent with bpf_lxc, prefer the `from_tunnel` parameter over
IS_BPF_OVERLAY to detect whether l3_local_delivery() was inlined by
from-overlay.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This specific workaround claims to handle service replies that came in via
tunnel, and require RevDNAT by kube-proxy before delivery to the
local pod.

But it's located in the to-container program (and so the next hop will
be the veth peer, *not* the kernel stack). And l3_local_delivery() in
from-overlay will have already applied the same workaround, before passing
the packet to the stack. Thus it's safe to remove the stale code in
bpf_lxc.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann julianwiedmann force-pushed the 1.15-bpf-local-delivery-overlay branch from 799a3af to 22ae0a4 Compare November 4, 2023 03:44
@julianwiedmann
Copy link
Member Author

/test

Wrap the access to CB_FROM_TUNNEL in HAVE_ENCAP, so that it can be easily
optimized out when there's no tunnel traffic in the cluster.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Instead of inflicting this code path on every user of redirect_ep(), limit
it to actual tunnel traffic. This is possible as we now have the
`from_tunnel` parameter in bpf_lxc's policy tail-call function.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann julianwiedmann force-pushed the 1.15-bpf-local-delivery-overlay branch from 22ae0a4 to 44ca7c7 Compare November 4, 2023 08:13
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann changed the title bpf: clean up workarounds for kube-proxy with tunnel traffic bpf: tunnel-related cleanups in to-container path Nov 6, 2023
@julianwiedmann julianwiedmann marked this pull request as ready for review November 6, 2023 03:00
@julianwiedmann julianwiedmann requested a review from a team as a code owner November 6, 2023 03:00
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 6, 2023
@julianwiedmann julianwiedmann merged commit 97af800 into cilium:main Nov 7, 2023
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request Nov 21, 2023
cilium#28920 introduced usage of
CB_FROM_TUNNEL in the IPv6 ingress tail-call. But unless all calling
programs get upgraded/downgraded in *very* specific order (so that while
this version of the tail-call is installed, *all* in-flight skbs have their
CB_FROM_TUNNEL populated), some packets potentially miss the
tunnel-specific processing that tail_ipv6_policy() is meant to apply. In
particular their type might not get switched to PACKET_HOST, resulting in
drops with SKB_DROP_REASON_OTHERHOST.

So we need to take a two-step approach - first roll out the support for
CB_FROM_TUNNEL into all callers, but wait until a subsequent release (1.16)
before tail_ipv6_policy() starts using it. Thus even on a downgrade to
1.15, all callers will continue to populate CB_FROM_TUNNEL.

Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
brb pushed a commit that referenced this pull request Nov 21, 2023
#28920 introduced usage of
CB_FROM_TUNNEL in the IPv6 ingress tail-call. But unless all calling
programs get upgraded/downgraded in *very* specific order (so that while
this version of the tail-call is installed, *all* in-flight skbs have their
CB_FROM_TUNNEL populated), some packets potentially miss the
tunnel-specific processing that tail_ipv6_policy() is meant to apply. In
particular their type might not get switched to PACKET_HOST, resulting in
drops with SKB_DROP_REASON_OTHERHOST.

So we need to take a two-step approach - first roll out the support for
CB_FROM_TUNNEL into all callers, but wait until a subsequent release (1.16)
before tail_ipv6_policy() starts using it. Thus even on a downgrade to
1.15, all callers will continue to populate CB_FROM_TUNNEL.

Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann julianwiedmann deleted the 1.15-bpf-local-delivery-overlay branch November 21, 2023 14:53
brb pushed a commit that referenced this pull request Nov 22, 2023
#28920 introduced usage of
CB_FROM_TUNNEL in the IPv6 ingress tail-call. But unless all calling
programs get upgraded/downgraded in *very* specific order (so that while
this version of the tail-call is installed, *all* in-flight skbs have their
CB_FROM_TUNNEL populated), some packets potentially miss the
tunnel-specific processing that tail_ipv6_policy() is meant to apply. In
particular their type might not get switched to PACKET_HOST, resulting in
drops with SKB_DROP_REASON_OTHERHOST.

So we need to take a two-step approach - first roll out the support for
CB_FROM_TUNNEL into all callers, but wait until a subsequent release (1.16)
before tail_ipv6_policy() starts using it. Thus even on a downgrade to
1.15, all callers will continue to populate CB_FROM_TUNNEL.

Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
pjablonski123 pushed a commit to pjablonski123/cilium that referenced this pull request Dec 15, 2023
cilium#28920 introduced usage of
CB_FROM_TUNNEL in the IPv6 ingress tail-call. But unless all calling
programs get upgraded/downgraded in *very* specific order (so that while
this version of the tail-call is installed, *all* in-flight skbs have their
CB_FROM_TUNNEL populated), some packets potentially miss the
tunnel-specific processing that tail_ipv6_policy() is meant to apply. In
particular their type might not get switched to PACKET_HOST, resulting in
drops with SKB_DROP_REASON_OTHERHOST.

So we need to take a two-step approach - first roll out the support for
CB_FROM_TUNNEL into all callers, but wait until a subsequent release (1.16)
before tail_ipv6_policy() starts using it. Thus even on a downgrade to
1.15, all callers will continue to populate CB_FROM_TUNNEL.

Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/kube-proxy Issues related to kube-proxy (not the kube-proxy-free mode). ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants