-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf: Introduce wildcard service entries to drop traffic towards unknown protocol/port combinations #40684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bd5a7e7
to
f5a6f60
Compare
fb75646
to
adf633a
Compare
adf633a
to
c391c35
Compare
8b5e388
to
f408d79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great overall. One point of feedback though.
f408d79
to
6b2266c
Compare
Notes from testing for general info: Setup I've used a kind cluster with an app deployment allowing me to run a TCP service on port 8000. I've also had to configure Cilium with: kubeProxyReplacement: true
l2announcements:
enabled: true
l2NeighDiscovery:
enabled: true Such that I can get host to LB VIP access directly, eg: $ kubectl -n default get pod netshoot-server-6b478b859-g5djl -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
netshoot-server-6b478b859-g5djl 1/1 Running 0 14m 10.244.1.145 cilium-testing-worker2 <none> <none>
$ kubectl -n default get service netshoot-service -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
netshoot-service LoadBalancer 10.96.80.251 172.18.255.200 8000:30990/TCP 15m app=netshoot-server
$ arp
Address HWtype HWaddress Flags Mask Iface
_gateway ether 86:2f:57:c7:2e:64 C enp0s1
172.18.0.4 ether ce:b8:de:ea:41:c7 C br-ec86dd730695
172.18.0.3 ether da:a4:c5:dd:95:50 C br-ec86dd730695
172.18.255.200 ether da:a4:c5:dd:95:50 C br-ec86dd730695
172.18.0.2 ether 6a:06:2c:eb:29:4d C br-ec86dd730695
[ajm@el9-dev default-cluster]$ kubectl -n kube-system get pods -owide | grep -E 'NAME|proxy'
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-proxy-hvr9r 1/1 Running 0 17m 172.18.0.4 cilium-testing-control-plane <none> <none>
kube-proxy-l4tnx 1/1 Running 0 17m 172.18.0.2 cilium-testing-worker <none> <none>
kube-proxy-v5tnd 1/1 Running 0 17m 172.18.0.3 cilium-testing-worker2 <none> <none> At this stage I can't recreate the packet loop described, but I can see evidence of the correct behaviour change. Test Server: iperf3 running in the I can't yet see the issue in tcpdump but I can see some evidence in Hubble. Before PR OSS main as of commit 2d5a128 Happy path:
Unhappy path:
After PR Repeating the same tests Happy path:
Unhappy path:
|
To actually get the packet loop, the interface that receives the packet (when forwarded) must make the routing decision to forward it to an external router, and the external router must make the routing decision to forward the packet back to the origin. This happens in standard public internet configurations if using the LB IPAM. I am not sure how to recreate the loop itself on a single host without significant scaffolding however. |
Do you know where the packet ends? You can use
First, if I understood correctly the issue - and @ajmmm and I have discussed a bit about it - as of today, the host NS should route back the packet via default GW, as the packet is punted to the kernel network stack. If the routing were to be the problem, we can always add an explicit route in the host NS:
But I suspect the packet is not looped back for other reasons in the kind setup of @ajmmm, or is dropped somewhere before reaching to the router (maybe outside of the kind cluster). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(lgtm, still needs a rebase)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bpf side lgtm! some small nits when you rebase anyway.
23c7aed
to
5cf22be
Compare
/test |
5f46cbb
to
da50a72
Compare
/test |
da50a72
to
1d29ad9
Compare
/test |
1d29ad9
to
fe966e5
Compare
/test |
/ci-e2e-upgrade |
/ci-ginkgo |
…ice(). Previously, if the lookup of a service derived from ingress 5-tuple results in no match, we undertake an additional lookup with IPPROTO_ANY. This was necessary to maintain backwards compatibility with earlier versions of Cilium that did not support protocol differentiation when programming the data path. Cilium 1.18 introduced logic to migrate IPPROTO_ANY service/backend BPF map entries to an appropriate protocol automatically, such that the additional lookups are no longer required. Appropriate tests updated to include explicit lookups for IPPROTO_TCP, IPPROTO_UDP, and mismatched protocol checks. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This change modifies the behaviour of lbXX_lookup_service() such that if an external service lookup does not find an entry, we undertake an additional lookup with DPort=0 and Protocol=0. This will facilitate a wildcard lookup that, when programmed with a suitable wildcard entry by the control plane, will trigger the data path to drop unmatched ingress traffic. This change modifies the semantics of said routines such that they no longer propagate changes to the key protocol field, unless a wildcard service entry was found in the external scope, and returned. In this case, the dport and protocol field will be zero. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com> Signed-off-by: Mikael Knutsson <mikael.knutsson@gmail.com>
…paths. Previously, lbX_lookup_service() would zeroise the key protocol field, such that callers would need to reinstate the key protocol field if their logic re-used the key. Following the previous commit, the lbX_lookup_service() routines will now only zeroise the key protocol (and dport) fields if an external wild-card entry is returned. Thus, the restoration of key values can be removed. This commit also removes lbX_key_set_protocol() helpers as they became broadly unnecessary. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This set of tests verifies that if we receive a packet towards a VIP on either an unknown destination port, or unknown protocol, the data plane will correctly drop if a wildcard entry is present in the service map. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This change extends the nodePortAddrKey structure to include the L4 Protocol number of an address, such that addresses with differing addresses will be better distributed in the nodePortAddrByPort map. This required the introduction of a helper routine to translate the L4Addr protocol field into the real IANA number. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Introduce logic in bpf_reconciler to map frontend service entries to wildcard service entries, and program them in the data path, for LoadBalancer and ClusterIP frontends of external scope. This will ensure the data path drops traffic towards unknown destination ports and/or protocols. Wildcard state is tracked as a map (indexed by IP Address) that contains a slice of Frontend ServiceIDs. The first entry that requires a wildcard will trigger programming of the data path entry. The last entry that depends on a wildcard entry to be removed will trigger removal of the wildcard entry. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com> Signed-off-by: Mikael Knutsson <mikael.knutsson@gmail.com>
Update the unit tests for loadbalancer and ciliumenvoyconfig to include wildcard entries. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Remove lb_v4_add_mixed_proto_service_with_flags() as it no longer makes sense. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
fe966e5
to
abf0fec
Compare
/test |
ci-ginkgo failures is likely a flake - happens across different tests on main, eg: |
PR to resolve a forwarding bug in data path.
This work is derived from work undertaken by Mikael Knutsson mikael.knutsson@gmail.com so has been credited in the relevant commit logs.
Summary of issue
Given a configured LoadBalancer or Cluster IP on a specific L4 destination port and protocol, ingress traffic towards the IP on an alternate destination port or protocol will not match eBPF service map lookup, so traffic be punted back to the network. Where the upstream network re-delivers the traffic back to Cilium, a packet loop is formed.
Summary of Changes
Release Notes: