Skip to content

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented Apr 23, 2025

Prompted by #1755 another stab at lazy decoding of BTF types. IterateVmlinux is roughly the pwru workload (IIRC) and InspectorGadget does what it says on the tin.

core: 1
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-1365U
                │  base.txt   │          eager-strings.txt           │
                │   sec/op    │   sec/op     vs base                 │
ParseVmlinux      80.59m ± 4%   26.95m ± 2%  -66.56% (p=0.002 n=6)
IterateVmlinux    193.7m ± 5%   151.8m ± 2%  -21.65% (p=0.002 n=6)
InspectorGadget   87.22m ± 1%   34.56m ± 2%  -60.37% (p=0.002 n=6)
BuildVmlinux      145.7m ± 4%
geomean           118.7m        52.09m       -53.00%               ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

                │   base.txt    │           eager-strings.txt           │
                │     B/op      │     B/op      vs base                 │
ParseVmlinux      24.427Mi ± 0%   9.970Mi ± 0%  -59.18% (p=0.002 n=6)
IterateVmlinux     52.51Mi ± 0%   34.71Mi ± 0%  -33.90% (p=0.002 n=6)
InspectorGadget    26.65Mi ± 0%   12.00Mi ± 0%  -54.97% (p=0.002 n=6)
BuildVmlinux       42.22Mi ± 0%
geomean            34.66Mi        16.07Mi       -50.47%               ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

                │  base.txt   │          eager-strings.txt           │
                │  allocs/op  │  allocs/op   vs base                 │
ParseVmlinux      271.9k ± 0%   146.1k ± 0%  -46.27% (p=0.002 n=6)
IterateVmlinux    398.9k ± 0%   272.9k ± 0%  -31.60% (p=0.002 n=6)
InspectorGadget   280.8k ± 0%   155.0k ± 0%  -44.80% (p=0.002 n=6)
BuildVmlinux      1.508k ± 0%
geomean           82.32k        183.5k       -41.24%               ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

@lmb
Copy link
Collaborator Author

lmb commented Apr 23, 2025

@brb could you try this in pwru and see how bad things are?

brb added a commit to cilium/pwru that referenced this pull request Apr 28, 2025
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb
Copy link
Member

brb commented Apr 28, 2025

@lmb Are you interested in perf benchmarks, or correctness? If the latter, then all good - cilium/pwru#548. Unfortunately, we do not have any benchmarking suites.

@lmb lmb force-pushed the btf-lazy-final-final-v2 branch from def29e7 to afa4ee7 Compare April 28, 2025 09:25
@lmb
Copy link
Collaborator Author

lmb commented Apr 28, 2025

I added a crude benchmark which just iterates all of vmlinux to simulate pwru. Even that has gotten faster!

@lmb lmb force-pushed the btf-lazy-final-final-v2 branch 3 times, most recently from 3ba613d to f9c4803 Compare April 28, 2025 11:25
@lmb lmb marked this pull request as ready for review April 28, 2025 11:27
@lmb lmb requested review from dylandreimerink and a team as code owners April 28, 2025 11:27
lmb added 14 commits April 30, 2025 15:56
Add a benchmark which replicates the types used by Inspektor Gadget
for a common configuration. Also add a benchmark which explicitly
iterates all types in vmlinux, which is similar to what pwru does.

See cilium#1755 (comment)

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Stop relying on readAndInflateTypes by creating a full BTF blob in
the test.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
mutableTypes is going to be removed when switching to lazy BTF parsing.
Move the implementation of AnyTypesByName to Spec.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Lazy decoding of BTF means that iterating over a Spec may produce
errors. The current iterator interface doesn't allow returning errors.

Replace Iterate() with a real iterator which calls into Spec.TypeByID
instead of going via an internal fast path. This also allows moving
Datasec fixups in the future.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Some Datasec are not correctly populated when parsing BTF from an ELF
file (vs. raw vmlinux BTF). To address this we currently do a full
pass over all types to find Datasec that need fixing up.

Instead, perform the fixup when retrieving a Type from a Spec. Spec
loaded from a raw BTF blob are kept unchanged.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Move the implementation of readAndInflateTypes into a new file. No
other changes intended.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Switch BTF decoding from operating on a reader to operating on a byte
slice. This simplifies the code and removes the need for a bunch of
intermediate buffers. It's also necessary to be able to lazily decode
BTF in the future.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Break up readAndInflateTypes into a dedicated decoder type.
Constructing the decoder does a single pass over the BTF which
collects the offsets of all types in the blob.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Refactor the decoder so that it is possible to unmarshal one type
at a time. The main challenges are:

* Unmarshaling a type requires unmarshaling all descendants. This
  is solved by recursively inflating types until reaching a leaf.
  Care needs to be taken to avoid infinite recursion when dealing
  with self-referential types like linked list headers. The trick
  here is that we store a partially inflated Type in d.types so
  that later function calls return the cached value.
  The partially inflated Type is removed from d.types on error to
  avoid polluting the decoder with "broken" types.
* Unmarshaling a type may require decoding declTags pointing at
  that type. This is handled by maintaining a lookaside index which
  maps type IDs to declTag typeIDs.
* Unmarshaling a legacy bitfield also requires a lookaside data
  structure.

Even though the decoder is now capable of lazy decoding we still
construct a full []Type when parsing BTF to keep the diff manageable.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Use the lazy decoder to avoid decoding all types upfront. This requires
adding an index from essentialName to TypeID to be able to implement
AnyTypeByName.

decoder has to be safe for concurrent use because we want to share the
raw BTF blob and so on when copying a Spec.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
It turns out that doing two passes over the encoded BTF is basically free.
Use this approach to pre-calculate the correct size for the data structures
required by the decoder.

This reduces allocations and copying by a substantial amount.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
We still construct full slices of various raw BTF types, only to
discard them afterwards. Instead, decode only a single raw type
and accumulate the inflated equivalents.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Appending to a nil slice allocates a slice with an 8 item capacity
as of Go 1.24. This is wasteful since most entries will only ever
need a single ID. Do some syntax gymnastics to avoid this waste.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb force-pushed the btf-lazy-final-final-v2 branch from f9c4803 to ef22608 Compare April 30, 2025 14:56
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@lmb lmb merged commit c03bd45 into cilium:main May 1, 2025
17 checks passed
@lmb lmb deleted the btf-lazy-final-final-v2 branch May 1, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants