-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hi!
I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). Since PGO helps with achieving better performance with many compilers (like Rustc, GCC, Clang, etc.) I think trying to optimize Sage with PGO can be a good idea. I did some benchmarks and want to share my results.
Test environment
- Fedora 39
- Linux kernel 6.6.9
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.75
- Sage version: the latest for now from the
main
branch on commit72b536f61ebb6332c57cf57fab9fe53b1e878c1d
- Disabled Turbo boost (for more stable results across runs)
Benchmarks
As a benchmark, I use built-in benchmarks with cargo bench
command. For the PGO optimization phase, I use cargo-pgo with cargo pgo optimize bench
. For the PGO training phase, I use the same benchmark with cargo pgo bench
.
Results
I got the following results:
- Release: https://gist.github.com/zamazan4ik/09666344f7cb0ee92a69d4a14a8b50e6
- PGO-optimized compared to Release: https://gist.github.com/zamazan4ik/2adb489319886015c98e393cac5e2e57
- (just for reference) PGO-instrumented compared to Release: https://gist.github.com/zamazan4ik/ae10d0fa65fb4be599876735b7ef15a6
According to the tests, PGO makes things faster in Sage.
I need to note that enabling Link-Time Optimization (LTO) is generally a good idea too - this optimization works well together with PGO. I even performed some benchmarks where I compared LTO to Release: https://gist.github.com/zamazan4ik/6be63330d2c97b510fdfc6b7aa7988c5 . However, in some cases, there are performance regressions - that need to be investigated. LTO was enabled with codegen-units = 1
and lto = "fat"
for the corresponding profiles in Cargo.toml
file.
Further steps
I can suggest the following action points:
- Perform more PGO benchmarks on Sage. If it shows improvements - add a note to the documentation about possible improvements in Sage performance with PGO.
- Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Sage according to their workloads.
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated into other projects:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag
Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that. I don't know how much you care about performance in Sage so I don't know how important these improvements are for the project. I hope we can use these benchmarks at least as an additional data point about PGO efficiency for compilers.