-
Notifications
You must be signed in to change notification settings - Fork 3.4k
loader: do not invoke llc separately #29458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cleanup is #29459 |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do you know since which version of clang this is possible?
I'm not sure actually, clang seems to have had |
The loader benchmarks currently fail with a nil pointer panic since the dirInfo global is not initialised. Replace the global with a helper function to fix the benchmarks. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Marking as |
/test |
Historically it wasn't possible to pass -mcpu and -mattr flags to clang so compilation proceeded in two steps: first invoking clang to emit llvm bytecode and feeding that into llc. This has the downside of having to manage two processes and incurs overhead to marshal and unmarshal llvm bytecode. Crucially, the compile path does not always invoke llc. Outputting preprocessed source code for debugging purposes has a special short circuit. Replace this logic with a single invocation of clang for all supported output types. This makes the compilation logic easier to understand and opens it up for further refactoring. It also makes compilation faster: goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/datapath/loader cpu: 12th Gen Intel(R) Core(TM) i7-1260P │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ CompileOnly-16 89.37m ± 1% 85.16m ± 4% -4.71% (p=0.002 n=6) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ CompileOnly-16 42.68Ki ± 2% 19.79Ki ± 3% -53.63% (p=0.002 n=6) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ CompileOnly-16 273.0 ± 1% 158.5 ± 1% -41.94% (p=0.002 n=6) Take this with a grain of salt of course, since most of the overhead of compilation is not in Go but in the clang process which we just wait for. The wall-clock improvement should be robust though. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Add the RSS used by clang for compilation to log output. This will give us a rough idea how much memory a particular compilation took. The messages look like: msg="Compilation had peak RSS of 103808 bytes" compiler-pid=95606 output=/tmp/TestCompile3849793333/001/bpf_lxc.dbg.o subsys=datapath-loader Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
/test |
We currently produce LLVM assembly and a .dbg.o ELF when compiling a program in debug mode. The LLVM assembly doesn't seem to be used at all and can be approximated from the compiled ELF via llvm-objdump -D. The .dbg.o is actually identical to the regular .o and can be skipped without losing anything. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
/test |
@rgo3 I had to add quite a bit of code (new tests, fixed benchmarks and dropping of some benchmark files) so it'd be great if you could take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good 🚀
loader: fix benchmarks
loader: do not invoke llc separately
loader: log maximum RSS used by clang for compilation
loader: remove obsolete debug outputs