-
Notifications
You must be signed in to change notification settings - Fork 134
Open
Labels
Description
Hi!
Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. I think since the project is performance-oriented, it would be interesting to try to test PGO for optimizing tquic
. I already did some benchmarks.
Test environment
- Fedora 38
- Linux kernel 6.5.5
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.73
tquic
version: the latest for now from thedevelop
branch on commit05c56e7425ec1149a9c95ca7bbcb6acbab861fd6
- Disabled Turbo boost
Benchmark setup
For benchmarking purposes, I use the project's benchmarks. Release benchmarking is done with cargo bench
, PGO optimized build is done with cargo-pgo with cargo pgo bench && cargo pgo optimize bench
. PGO profiles are collected from the benchmark workload itself.
Results
I got the following results:
- Release + PGO-optimized compared to Release: https://gist.github.com/zamazan4ik/875e3d1948ac3db575bcbb442a6f6ebb
- (just for reference) PGO-Instrumented compared to Release: https://gist.github.com/zamazan4ik/f4af4f2e510d433253bf007ecb584507
According to the tests, PGO consistently improves tquic
performance in some scenarios.
Further steps
I can suggest the following things to do:
- Evaluate PGO's applicability to
tquic
in more scenarios. - If PGO helps to achieve better performance - add a note to tquic's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for
tquic
. - Maybe get some insights from the PGO profiles and optimize manually the code according to the profiles (maybe more aggressive inlining or something like that)
zefengsysu