Skip to content

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented May 8, 2024

To learn more on netkit, see also: https://sched.co/1R2s5

Connectivity tests for netkit and netkit-l2 are passing including netkit flavors + BPF host routing / tunneling / legacy routing.

For local netkit-l2 testing, the following merged kernel patches are needed:

For local netkit testing with tunneling, the following merged kernel patches are needed:

These kernel patches are being in process to get to stable trees as well as Ubuntu 24.04 LTS (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068655).

Once kernel patches are all in, I'll extend the e2e connectivity tests with netkit. Tracked in : #32939
LVH PR: cilium/little-vm-helper-images#525

Merged e2e tests: #33005

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 8, 2024
@borkmann borkmann added release-note/major This PR introduces major new functionality to Cilium. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/performance There is a performance impact of this. release-blocker/1.16 This issue will prevent the release of the next version of Cilium. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 8, 2024
@borkmann borkmann marked this pull request as ready for review June 5, 2024 11:50
@borkmann borkmann requested review from a team as code owners June 5, 2024 11:50
@borkmann borkmann requested review from squeed and nathanjsweet June 5, 2024 11:50
@borkmann borkmann requested a review from a team as a code owner June 5, 2024 11:55
@borkmann borkmann requested a review from a user June 5, 2024 11:55
@borkmann borkmann force-pushed the pr/netkit2 branch 3 times, most recently from 45e9d67 to cbba89b Compare June 5, 2024 14:09
Add netkit as well as netkit-l2 mode as another option next to veth.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member Author

borkmann commented Jun 6, 2024

/test

@borkmann borkmann requested a review from NikAleksandrov June 7, 2024 14:18
Copy link

@NikAleksandrov NikAleksandrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Two minor nits about an error msgs, but overall pretty good.

Add netkit support to Cilium.

netkit is a new, minimal BPF-programmable device we presented at
LSF/MM/BPF, LPC and KubeCons some time ago. The core idea is that
BPF programs are executed within the drivers xmit routine and
therefore e.g. in case of containers/Pods moving BPF processing
closer to the source.

One of the goals was that in case of Pod egress traffic, this
allows to move BPF programs from hostns tcx ingress into the
device itself, providing earlier drop or forward mechanisms, for
example, if the BPF program determines that the skb must be sent
out of the node, then a redirect to the physical device can take
place directly without going through per-CPU backlog queue. This
helps to shift processing for such traffic from softirq to process
context, leading to better scheduling decisions/performance (see
measurements in the slides).

Going forward, we plan to use netkit devices in Cilium as the main
device type for connecting Pods. They will be operated in L3 mode
in order to simplify a Pod's neighbor management and the peer will
operate in default drop mode, so that no traffic is leaving between
the time when a Pod is brought up by the CNI plugin and programs
attached by the agent. This PR supports both netkit operation modes
with the L3 mode being the main/recommended one.

Note: netkit in L3 mode has an all-zero mac, so the eth header has
dst == src == zero mac, hence the tests for mac len in the template
code as node mac is not needed/used. Also in L3 mode any ARP
responder is compiled out as the Pod with netkit does not attempt
to resolve mac addresses (it's a NOARP device).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231024214904.29825-1-daniel@iogearbox.net
Link: https://sched.co/1R2s5
@borkmann
Copy link
Member Author

borkmann commented Jun 7, 2024

/test

Copy link
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but I think attachSKBProgram gets much clearer if we return early when attaching to netkit devices and tcx and netkit link removal can be merged into one function.


// hasCiliumNetkitLinks returns true if device has a Cilium-managed
// netkit program with the given attach type.
func hasCiliumNetkitLinks(device netlink.Link, attach ebpf.AttachType) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this function is almost exactly the same as hasCiliumTCXLinks. Maybe we can merge those two or factor out the similarities as a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense, I'll look into this as followup.

Copy link
Contributor

@aspsk aspsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@borkmann borkmann force-pushed the pr/netkit2 branch 2 times, most recently from 3a1bb09 to 309ad35 Compare June 7, 2024 16:44
@borkmann
Copy link
Member Author

borkmann commented Jun 7, 2024

/test

Copy link
Member

@hemanthmalla hemanthmalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@borkmann borkmann force-pushed the pr/netkit2 branch 2 times, most recently from 6a935ae to 675d688 Compare June 7, 2024 18:07
@borkmann
Copy link
Member Author

borkmann commented Jun 7, 2024

/test

borkmann added 7 commits June 7, 2024 18:35
Add a similar loading mechanism as we recently got with tcx, but for
netkit attachments. Given both utilize bpf_mprog underneath the API
looks very similar.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The programs we attach via tcx on the physical devices are using
bpf_redirect_peer() for inbound traffic into netkit device. Similarly,
we use bpf_redirect_neigh() for the way out, pushing from netkit peer
to phys device directly.

For local Pod-Pod on the same node, it is slightly different in that
as opposed to veth devices we do not use bpf_redirect_peer() to go
from ingress->ingress since we're on egress for the case of netkit.
Thus egress->egress needs to use the regular bpf_redirect(). From a
performance PoV there is no difference since both cases need to go
via per-CPU backlog once.

The detection whether we are on ingress or egress side is based on
the ingress_ifindex. In case of netkit, it is set to 0 shortly before
via skb scrubbing. In case of tcx at the ingress side it's set to the
host facing veth (> 0).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Dump the status for the used device mode into cilium status, so that this
is picked up by the sysdump.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Both just remove the link file, therefore create a common helper.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add bpf.datapathMode to Helm, so that netkit can be selected as
alternative to veth.

Example via Cilium CLI:

  ./cilium-cli install --wait --chart-directory=install/kubernetes/cilium
      --helm-set=hubble.eventBufferCapacity=65535 --helm-set=bpf.monitorAggregation=none
      --helm-set=cluster.name=default --nodes-without-cilium --helm-set-string=kubeProxyReplacement=true
      --set='' --helm-set=image.repository=localhost:5000/cilium/cilium-dev --helm-set=image.useDigest=false
      --helm-set=image.tag=local --helm-set=image.pullPolicy=IfNotPresent
      --helm-set=operator.image.repository=localhost:5000/cilium/operator --helm-set=operator.image.suffix=""
      --helm-set=operator.image.tag=local --helm-set=operator.image.pullPolicy=IfNotPresent
      --helm-set=operator.image.useDigest=false --helm-set-string=bpf.datapathMode=netkit
      --helm-set=ipv6.enabled=true --helm-set=tunnelProtocol=geneve

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a section on netkit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a small summary to help guide users.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member Author

borkmann commented Jun 7, 2024

/test

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super exciting, I can't wait to see this released. I left a small nit in the tuning page about the helm settings, but not a blocker. The Recommendation section is a great addition. Will netkit-l2 and the differences between l3 and l2 be documented as a follow-up?

--set enableIPv6BIGTCP=true \\
--set ipv4.enabled=true \\
--set enableIPv4BIGTCP=true \\
--set kubeProxyReplacement=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing

--set k8sServiceHost=${API_SERVER_IP} \
--set k8sServicePort=${API_SERVER_PORT}

https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#quick-start

Copy link
Member Author

@borkmann borkmann Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will add this as a follow-up given the older/existing sections need this too. Between netkit and netkit-l2 there should not be much diff in terms of performance, netkit in L3 mode basically tells the Pod that it doesn't need to resolve ARP which is great since it's not really necessary anyway so we can compile out our BPF ARP responder and such, keeps things simpler as it should be. I can add a note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan, thank you!

@borkmann borkmann merged commit 13d257f into main Jun 7, 2024
@borkmann borkmann deleted the pr/netkit2 branch June 7, 2024 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/performance There is a performance impact of this. release-blocker/1.16 This issue will prevent the release of the next version of Cilium. release-note/major This PR introduces major new functionality to Cilium.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants