Skip to content

Conversation

jrife
Copy link
Contributor

@jrife jrife commented Jul 27, 2025

Recently, I added the BENCH probe type to bpftrace (bpftrace/bpftrace#4343) for implementing BPF micro benchmarks. I thought that something similar could be handy in Cilium as an extension to the BPF unit test framework for benchmarking various programs, gauging the performance effects of BPF changes, etc. and wondered if others would find it useful. Micro benchmarks are never the full picture when it comes to performance, but still, it could be a nice tool.

So,

Introduce BENCH for implementing BPF micro benchmarks. bpftest (make run_bpf_tests) ignores BENCH programs; they are only executed when running bpfbench (make run_bpf_benchmarks). If a PKTGEN program is
defined, it is run before the BENCH program to generate its input, but SETUP and CHECK programs are ignored.

jordan@t14:~/code/cilium$ make run_bpf_benchmarks
DOCKER_ARGS=--privileged RUN_AS_ROOT=1 contrib/scripts/builder.sh \
	make  -j16 -C bpf/tests/ run_bench \
		"BPF_TEST_FILE="""
make: Entering directory '/go/src/github.com/cilium/cilium/bpf/tests'
go run ./bpfbench \
	-bpf-test-path /go/src/github.com/cilium/cilium/bpf/tests \

tc_egressgw_redirect_bench_example	 1000000	      1328 ns/op
make: Leaving directory '/go/src/github.com/cilium/cilium/bpf/tests'

As part of this change, refactor bpf_test.go a bit to extract the useful bits and make them reusable to bpfbench. I considered just putting everything inside bpf_test.go, maybe piggybacking on top of Go's Benchmark*s somehow, but found this approach unworkable. This lead me down the path of implementing benchmarking as a separate tool and making some of the functionality shared between bpftest and bpfbench. I would have liked to use tparse for output formatting, but at present it doesn't support reporting for benchmarks, so the output is currently quite raw.(Coincidentally, mfridman/tparse#67 was reported by @joestringer a few years ago, but alas, it has not yet been implemented)

@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 27, 2025
@jrife jrife force-pushed the jrife/bpf-bench branch from bfeacb4 to bab7281 Compare July 27, 2025 21:09
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 27, 2025
Prepare for further refactoring by cleaning up the test structure
validation logic.

First, simplify the logic a bit to get rid of the secondary regex match
inside checkProgExists and instead just defer the existence check for
progs.checkProg until after the first pass. Just Ensure uniqueness of
each setup/pktgen/check program along the way with checkProgUnique.
Before, the logic would not catch if a check is not defined for a test
until later when a panic occurs, since progs.checkProg would be nil, but
now existence of a CHECK program is enforced.

Don't rely on the program name at all, only the section names. Before,
weird edge cases could occur with strings.Replace() if the test program
name itself had the words "setup", "pktgen", or "check" in them. This
change uncovered the case with nat4_port_allocation where the test name
in the CHECK for the nat4_port_allocation_tcp and
nat4_port_allocation_udp tests was wrong, so fix this.

Signed-off-by: Jordan Rife <jrife@google.com>
@jrife jrife force-pushed the jrife/bpf-bench branch from bab7281 to 637e24b Compare July 27, 2025 21:12
@jrife jrife changed the title BENCH for BPF microbenchmarks (WIP) bpf: Introduce BENCH test directive for BPF micro benchmarks Jul 27, 2025
@jrife jrife changed the title bpf: Introduce BENCH test directive for BPF micro benchmarks [RFC] bpf: Introduce BENCH test directive for BPF micro benchmarks Jul 27, 2025
@jrife jrife force-pushed the jrife/bpf-bench branch from 637e24b to 626a5be Compare July 27, 2025 21:43
jrife added 2 commits July 27, 2025 15:42
Prepare for the next patch by moving some of the logic from bpf_test.go
into helpers.go and exporting it as a set of reusable functions and
types:

* ForEachSuite allows you to iterate over all BPF unit test collections
  one by one, transparently handling file reads and BPF collection
  preprocessing.
* CollectionToSuite converts a CollectionSpec and Collection for a BPF
  test into a set of Test structs. It performs all the validation on the
  test program structure to ensure the existence and uniqueness of
  setup/pkg/check programs for each test.
* Test contains all the logic for executing a BPF unit test, including
  the execution of its setup, pktgen, and check programs.

Signed-off-by: Jordan Rife <jrife@google.com>
Introduce BENCH for implementing BPF micro benchmarks. bpftest (make
run_bpf_tests) ignores BENCH programs; they are only executed when
running bpfbench (make run_bpf_benchmarks). If a PKTGEN program is
defined, it is run before the BENCH program to generate its input, but
SETUP and CHECK programs are ignored.

```
jordan@t14:~/code/cilium$ make run_bpf_benchmarks
DOCKER_ARGS=--privileged RUN_AS_ROOT=1 contrib/scripts/builder.sh \
	make  -j16 -C bpf/tests/ run_bench \
		"BPF_TEST_FILE="""
make: Entering directory '/go/src/github.com/cilium/cilium/bpf/tests'
go run ./bpfbench \
	-bpf-test-path /go/src/github.com/cilium/cilium/bpf/tests \

tc_egressgw_redirect_bench_example	 1000000	      1328 ns/op
make: Leaving directory '/go/src/github.com/cilium/cilium/bpf/tests'
```

Signed-off-by: Jordan Rife <jrife@google.com>
@jrife jrife force-pushed the jrife/bpf-bench branch from 626a5be to ceb4386 Compare July 27, 2025 22:57
@jrife jrife marked this pull request as ready for review July 27, 2025 23:26
@jrife jrife requested review from a team as code owners July 27, 2025 23:26
@jrife jrife requested a review from borkmann July 27, 2025 23:26
@joestringer joestringer added the release-note/ci This PR makes changes to the CI. label Aug 4, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 4, 2025
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Overall LGTM, seems like a nice improvement to have better infrastructure for this use case. I had some super minor nits below, nice-to-haves for consistency. The PR will need a rebase anyway.

@@ -554,6 +554,11 @@ run_bpf_tests: ## Build and run the BPF unit tests using the cilium-builder cont
"JUNIT_PATH=$(JUNIT_PATH)" \
"V=$(BPF_TEST_VERBOSE)"

run_bpf_benchmarks: ## Build and run the BPF benchmarks tests using the cilium-builder container image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bench-bpf to follow the existing bench prefix for benchmark targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll also move the target up there to group all the bench stuff together.

@@ -50,6 +50,7 @@ clean:
rm -f $(wildcard *.d)

BPF_TEST_FLAGS:= $(GO_TEST_FLAGS)
BPF_BENCH_FLAGS:=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse BENCHFLAGS instead or as a basis for this variable? We could move the definition from Makefile to Makefile.defs for better sharing across different makefiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I'll check.

Comment on lines +334 to +341
data := make([]byte, 4096-256-320)

// ctx is only used for tc programs
// non-empty ctx passed to non-tc programs will cause error: invalid argument
ctx := make([]byte, 0)
if tt.benchProg.Type() == ebpf.SchedCLS {
// sizeof(struct __sk_buff) < 256, let's make it 256
ctx = make([]byte, 256)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few magic constants here that seem like they could be better documented through declaration as vars with commetns that reference the corresponding Linux kernel source to explain why these are the right numbers (or hint to someone who has problems how to adjust them).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(realizing upon subsequent review these magic constants were already here >.<)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second commit seems to be doing a little bit more than a straight refactor, which makes it harder to review. Ideally a commit should be strictly moving code around or changing the logic of the code so we can better evaluate the functional changes and not lose them amongst 800 other lines of code. That said at least so far it seems pretty similar with some minor improvements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looks like in the end there was a log statement that didn't get copied over and a couple of minor function signature changes which had the same logic just expressed in a better way. Overall the commit makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second commit seems to be doing a little bit more than a straight refactor, which makes it harder to review

I'll break this up into some more granular transformations to make it clearer. Probably something like,

commit n: Create the skaffolding to migrate some chunk of code from A to B
commit n+1: Move chunk of code from A to B
... rinse and repeat until the state of the codebase matches what is currently all done in commit 2.

I agree the transformations should be dead simple to understand at a glance; I wanted to gauge general interest first before doing git surgery on this bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense yeah. At least to me this seems helpful, but I'm not actively pursuing datapath benchmarking. If someone else in @cilium/sig-datapath is, I'd encourage @cilium/sig-datapath to try this out and provide feedback.

continue
}

t.Logf("Skipping program '%s' of type '%s': BPF_PROG_RUN not supported", p.Name, p.Type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still log this to provide some breadcrumbs for people running the tests as to why some programs are skipped? Just concerned that people might draw the wrong conclusions from the test runner silently skipping some progs.

@jrife
Copy link
Contributor Author

jrife commented Aug 6, 2025

Stray thought, a logical continuation of this work would seem to be allowing runtime program stats to be gathered on demand and exported as metrics (if this isn't already done somewhere in Cilium). I can see this being quite valuable for us in doing profiling of live systems.

@pchaigno pchaigno added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/performance There is a performance impact of this. labels Aug 11, 2025
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/performance There is a performance impact of this. release-note/ci This PR makes changes to the CI. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants