-
Notifications
You must be signed in to change notification settings - Fork 766
Optimize New{Map,Program}From{ID,FD}
and LoadPinned{Map,Program}
#1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4dbd574
to
6dd2a24
Compare
New{Map,Program}From{ID,FD}
and LoadPinned{Map,Program}
29801e3
to
1237ae6
Compare
Previously, we didn't really have a test that covers all of Map.Info(), only the procfs-based fallback. This patch fixes that and requires fdinfo to be present from now on. This was added to the kernel in 4.9. Signed-off-by: Timo Beckers <timo@isovalent.com>
A follow-up commit will aim to improve the performance and reduce allocations of the fdinfo reader. Signed-off-by: Timo Beckers <timo@isovalent.com>
9b3d99c
to
d52c5bc
Compare
core: 1 goos: linux goarch: amd64 pkg: github.com/cilium/ebpf cpu: 13th Gen Intel(R) Core(TM) i7-1365U │ base.txt │ opt.txt │ │ sec/op │ sec/op vs base │ ScanFdInfoReader 3.212µ ± 1% 1.983µ ± 1% -38.25% (p=0.002 n=6) │ base.txt │ opt.txt │ │ B/op │ B/op vs base │ ScanFdInfoReader 4.531Ki ± 0% 4.050Ki ± 0% -10.62% (p=0.002 n=6) │ base.txt │ opt.txt │ │ allocs/op │ allocs/op vs base │ ScanFdInfoReader 24.000 ± 0% 3.000 ± 0% -87.50% (p=0.002 n=6) Signed-off-by: Timo Beckers <timo@isovalent.com>
This commit lifts runtime statistics out of ProgramInfo to allow them to be queried without fetching all of ProgramInfo, which would otherwise require multiple calls to OBJ_INFO, multiple allocations, as well as parsing fdinfo. Signed-off-by: Timo Beckers <timo@isovalent.com>
Over the years, a lot of code was added to new{Map,Program}InfoFromFd as the kernel started exposing more object information. However, for the sake of opening a Map or Program from an fd/id/pin, not much information is needed; certainly not some of the extended info like the program bytecode and other arrays of associated object ids. This commit introduces a 'minimal' version of the object info retrieval to speed up the process of opening many objects in sequence, like when iterating them with {Map,Program}GetNextID or walking a bpffs directory. ``` goos: linux goarch: amd64 pkg: github.com/cilium/ebpf cpu: AMD Ryzen 7 3700X 8-Core Processor │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ NewMapFromFD-16 13934.0n ± 1% 663.4n ± 1% -95.24% (p=0.002 n=6) NewProgramFromFD-16 15.403µ ± 1% 1.139µ ± 1% -92.61% (p=0.002 n=6) geomean 16.70µ 6.487µ -61.16% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ NewMapFromFD-16 5403.0 ± 0% 256.0 ± 0% -95.26% (p=0.002 n=6) NewProgramFromFD-16 5367.0 ± 0% 628.0 ± 0% -88.30% (p=0.002 n=6) geomean 4.637Ki 1.951Ki -57.93% ¹ all samples are equal │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ NewMapFromFD-16 24.000 ± 0% 3.000 ± 0% -87.50% (p=0.002 n=6) NewProgramFromFD-16 23.000 ± 0% 4.000 ± 0% -82.61% (p=0.002 n=6) geomean 21.14 11.17 -47.17% ¹ all samples are equal ``` Signed-off-by: Timo Beckers <timo@isovalent.com>
lmb
approved these changes
Jun 3, 2025
ti-mo
added a commit
to ti-mo/cilium
that referenced
this pull request
Jun 3, 2025
cilium/ebpf#1791 optimized opening bpf objects and calling obj_info, since it was allocating quite heavily before. This is the effect on the GetBPFUsage benchmark: goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/metrics cpu: AMD Ryzen 7 3700X 8-Core Processor │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ GetBPFUsage-16 145.43m ± 1% 77.51m ± 1% -46.70% (p=0.002 n=6) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ GetBPFUsage-16 50.38Mi ± 0% 24.24Mi ± 0% -51.89% (p=0.002 n=6) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ GetBPFUsage-16 358.11k ± 0% 82.67k ± 0% -76.92% (p=0.002 n=6) This should make the /metrics endpoint even more responsive and reduce the amount of garbage it creates. Signed-off-by: Timo Beckers <timo@isovalent.com>
github-merge-queue bot
pushed a commit
to cilium/cilium
that referenced
this pull request
Jun 4, 2025
cilium/ebpf#1791 optimized opening bpf objects and calling obj_info, since it was allocating quite heavily before. This is the effect on the GetBPFUsage benchmark: goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/metrics cpu: AMD Ryzen 7 3700X 8-Core Processor │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ GetBPFUsage-16 145.43m ± 1% 77.51m ± 1% -46.70% (p=0.002 n=6) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ GetBPFUsage-16 50.38Mi ± 0% 24.24Mi ± 0% -51.89% (p=0.002 n=6) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ GetBPFUsage-16 358.11k ± 0% 82.67k ± 0% -76.92% (p=0.002 n=6) This should make the /metrics endpoint even more responsive and reduce the amount of garbage it creates. Signed-off-by: Timo Beckers <timo@isovalent.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses some performance concerns around opening many BPF objects from fd/id/pin, such as when iterating all objects on the system or walking bpffs. I've introduced a few benchmarks targeting the worst offenders (string allocations in fdinfo scanning and slice allocs in
new{Map,Program}Info*
.Program.Stats()
was added, lifting highly-variable statistics fields out of ProgramInfo, which is otherwise prohibitively expensive to call repeatedly for gathering metrics. ProgramInfo.RunCount, .Runtime and .RecursionMisses were moved to ProgramStats.Total gains made by this optimization pass:
In terms of real-world performance in Cilium, this is a benchmark creating 1000 maps and 2000 programs, then iterating them all, opening them as *Map and *Program, filtering by name, then calling .Info():
This code is executed on each call to the
/metrics
endpoint and is by far the largest contributor, shaving off roughly another 40% in terms of time spent blocking the scrape.