Skip to content

Conversation

lmb
Copy link
Contributor

@lmb lmb commented Nov 28, 2023

loader: fix benchmarks

The loader benchmarks currently fail with a nil pointer panic since the
dirInfo global is not initialised. Replace the global with a helper function
to fix the benchmarks.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

loader: do not invoke llc separately

Historically it wasn't possible to pass -mcpu and -mattr flags to clang so
compilation proceeded in two steps: first invoking clang to emit llvm
bytecode and feeding that into llc. This has the downside of having to
manage two processes and incurs overhead to marshal and unmarshal llvm
bytecode.

Crucially, the compile path does not always invoke llc. Outputting 
preprocessed source code for debugging purposes has a special short circuit.

Replace this logic with a single invocation of clang for all supported 
output types. This makes the compilation logic easier to understand and 
opens it up for further refactoring. It also makes compilation faster:

    goos: linux
   goarch: amd64
   pkg: github.com/cilium/cilium/pkg/datapath/loader
   cpu: 12th Gen Intel(R) Core(TM) i7-1260P
               │   old.txt   │              new.txt              │
               │   sec/op    │   sec/op     vs base              │
   CompileOnly-16   89.37m ± 1%   85.16m ± 4%  -4.71% (p=0.002 n=6)

                │   old.txt    │               new.txt               │
               │     B/op     │     B/op      vs base               │
   CompileOnly-16   42.68Ki ± 2%   19.79Ki ± 3%  -53.63% (p=0.002 n=6)

                │  old.txt   │              new.txt              │
               │ allocs/op  │ allocs/op   vs base               │
   CompileOnly-16   273.0 ± 1%   158.5 ± 1%  -41.94% (p=0.002 n=6)

Take this with a grain of salt of course, since most of the overhead of
compilation is not in Go but in the clang process which we just wait for.
The wall-clock improvement should be robust though.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

loader: log maximum RSS used by clang for compilation

Add the RSS used by clang for compilation to log output. This will give us a
rough idea how much memory a particular compilation took. The messages look
like:

    msg="Compilation had peak RSS of 103808 bytes" compiler-pid=95606
   output=/tmp/TestCompile3849793333/001/bpf_lxc.dbg.o
subsys=datapath-loader

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

loader: remove obsolete debug outputs

We currently produce LLVM assembly and a .dbg.o ELF when compiling a program
in debug mode. The LLVM assembly doesn't seem to be used at all and can be
approximated from the compiled ELF via llvm-objdump -D. The .dbg.o is
actually identical to the regular
.o and can be skipped without losing anything.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

@lmb lmb added area/loader Impacts the loading of BPF programs into the kernel. release-note/misc This PR makes changes that have no direct user impact. labels Nov 28, 2023
@lmb lmb requested a review from rgo3 November 28, 2023 17:29
@lmb lmb requested a review from a team as a code owner November 28, 2023 17:29
@lmb
Copy link
Contributor Author

lmb commented Nov 28, 2023

Cleanup is #29459

@lmb
Copy link
Contributor Author

lmb commented Nov 29, 2023

/test

Copy link
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do you know since which version of clang this is possible?

@lmb
Copy link
Contributor Author

lmb commented Nov 29, 2023

I'm not sure actually, clang seems to have had -mattr and -mcpu since 2005. Maybe something to do with passing -march?

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 29, 2023
The loader benchmarks currently fail with a nil pointer panic since
the dirInfo global is not initialised. Replace the global with a
helper function to fix the benchmarks.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb added the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Nov 29, 2023
@lmb
Copy link
Contributor Author

lmb commented Nov 29, 2023

Marking as dont-merge while I investigate a CI flake.

@lmb
Copy link
Contributor Author

lmb commented Nov 29, 2023

/test

lmb added 2 commits November 29, 2023 13:27
Historically it wasn't possible to pass -mcpu and -mattr flags to clang
so compilation proceeded in two steps: first invoking clang to emit
llvm bytecode and feeding that into llc. This has the downside of having
to manage two processes and incurs overhead to marshal and unmarshal
llvm bytecode.

Crucially, the compile path does not always invoke llc. Outputting
preprocessed source code for debugging purposes has a special short
circuit.

Replace this logic with a single invocation of clang for all supported
output types. This makes the compilation logic easier to understand and
opens it up for further refactoring. It also makes compilation faster:

    goos: linux
    goarch: amd64
    pkg: github.com/cilium/cilium/pkg/datapath/loader
    cpu: 12th Gen Intel(R) Core(TM) i7-1260P
                │   old.txt   │              new.txt              │
                │   sec/op    │   sec/op     vs base              │
    CompileOnly-16   89.37m ± 1%   85.16m ± 4%  -4.71% (p=0.002 n=6)

                │   old.txt    │               new.txt               │
                │     B/op     │     B/op      vs base               │
    CompileOnly-16   42.68Ki ± 2%   19.79Ki ± 3%  -53.63% (p=0.002 n=6)

                │  old.txt   │              new.txt              │
                │ allocs/op  │ allocs/op   vs base               │
    CompileOnly-16   273.0 ± 1%   158.5 ± 1%  -41.94% (p=0.002 n=6)

Take this with a grain of salt of course, since most of the overhead
of compilation is not in Go but in the clang process which we just
wait for. The wall-clock improvement should be robust though.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Add the RSS used by clang for compilation to log output. This will
give us a rough idea how much memory a particular compilation took.
The messages look like:

    msg="Compilation had peak RSS of 103808 bytes" compiler-pid=95606
    output=/tmp/TestCompile3849793333/001/bpf_lxc.dbg.o subsys=datapath-loader

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb
Copy link
Contributor Author

lmb commented Nov 29, 2023

/test

We currently produce LLVM assembly and a .dbg.o ELF when compiling
a program in debug mode. The LLVM assembly doesn't seem to be used
at all and can be approximated from the compiled ELF via
llvm-objdump -D. The .dbg.o is actually identical to the regular
.o and can be skipped without losing anything.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb
Copy link
Contributor Author

lmb commented Nov 29, 2023

/test

@lmb lmb requested a review from rgo3 November 29, 2023 14:51
@lmb
Copy link
Contributor Author

lmb commented Nov 29, 2023

@rgo3 I had to add quite a bit of code (new tests, fixed benchmarks and dropping of some benchmark files) so it'd be great if you could take another look.

@lmb lmb removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Nov 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 29, 2023
@lmb lmb added this pull request to the merge queue Nov 29, 2023
@lmb lmb removed this pull request from the merge queue due to a manual request Nov 29, 2023
Copy link
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good 🚀

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 29, 2023
@lmb lmb added this pull request to the merge queue Nov 29, 2023
Merged via the queue into cilium:main with commit e6a55a0 Nov 29, 2023
@lmb lmb deleted the oneshot-compile branch November 29, 2023 17:57
@ti-mo ti-mo added the backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. label Dec 17, 2024
@julianwiedmann julianwiedmann added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loader Impacts the loading of BPF programs into the kernel. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants