Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)

Hi!

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available [here](https://github.com/zamazan4ik/awesome-pgo) (with a lot of other PGO-related information). Since PGO helps with achieving better performance with many compilers (like Rustc, GCC, Clang, etc.) I think trying to optimize Sage with PGO can be a good idea. I did some benchmarks and want to share my results.

## Test environment

* Fedora 39
* Linux kernel 6.6.9
* AMD Ryzen 9 5900x
* 48 Gib RAM
* SSD Samsung 980 Pro 2 Tib
* Compiler - Rustc 1.75
* Sage version: the latest for now from the `main` branch on commit `72b536f61ebb6332c57cf57fab9fe53b1e878c1d`
* Disabled Turbo boost (for more stable results across runs)

## Benchmarks

As a benchmark, I use built-in benchmarks with `cargo bench` command. For the PGO optimization phase, I use [cargo-pgo](https://github.com/Kobzol/cargo-pgo) with `cargo pgo optimize bench`. For the PGO training phase, I use the same benchmark with `cargo pgo bench`.

## Results

I got the following results:

* Release: https://gist.github.com/zamazan4ik/09666344f7cb0ee92a69d4a14a8b50e6
* PGO-optimized compared to Release: https://gist.github.com/zamazan4ik/2adb489319886015c98e393cac5e2e57
* (just for reference) PGO-instrumented compared to Release: https://gist.github.com/zamazan4ik/ae10d0fa65fb4be599876735b7ef15a6

According to the tests, PGO makes things faster in Sage.

I need to note that enabling Link-Time Optimization (LTO) is generally a good idea too - this optimization works well together with PGO. I even performed some benchmarks where I compared LTO to Release: https://gist.github.com/zamazan4ik/6be63330d2c97b510fdfc6b7aa7988c5 . However, in some cases, there are performance regressions - that need to be investigated. LTO was enabled with `codegen-units = 1` and `lto = "fat"` for the corresponding profiles in `Cargo.toml` file.

## Further steps

I can suggest the following action points:

* Perform more PGO benchmarks on Sage. If it shows improvements - add a note to the documentation about possible improvements in Sage performance with PGO.
* Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Sage according to their workloads.

Testing Post-Link Optimization techniques (like [LLVM BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md)) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated into other projects:

* Rustc: a CI [script](https://github.com/rust-lang/rust/blob/master/src/ci/stage-build.py) for the multi-stage build
* GCC:
  - Official [docs](https://gcc.gnu.org/install/build.html), section "Building with profile feedback" (even AutoFDO build is supported)
  - A [part](https://github.com/gcc-mirror/gcc/blob/4832767db7897be6fb5cbc44f079482c90cb95a6/configure#L7818) in a "wonderful" `configure` script 
* Clang: [Docs](https://llvm.org/docs/HowToBuildWithPGO.html) 
* Python: 
  - CPython: [README](https://github.com/python/cpython#profile-guided-optimization)
  - Pyston: [README](https://github.com/pyston/pyston#building)
* Go: [Bash script](https://github.com/golang/go/blob/master/src/cmd/compile/profile.sh)
* V8: [Bazel flag](https://github.com/v8/v8/blob/main/BUILD.gn#L184)
* ChakraCore: [Scripts](https://github.com/chakra-core/ChakraCore/tree/master/Build/scripts/pgo)
* Chromium: [Script](https://chromium.googlesource.com/chromium/src/build/config/+/refs/heads/main/compiler/pgo/BUILD.gn)
* Firefox: [Docs](https://firefox-source-docs.mozilla.org/build/buildsystem/pgo.html)
   - Thunderbird has PGO support too
* PHP - [Makefile command](https://github.com/php/php-src/blob/master/build/Makefile.global#L138) and old Centminmod [scripts](https://github.com/centminmod/php_pgo_training_scripts)
* MySQL: [CMake script](https://github.com/mysql/mysql-server/blob/8.0/cmake/fprofile.cmake)
* YugabyteDB: [GitHub commit](https://github.com/yugabyte/yugabyte-db/commit/34cb791ed9d3d5f8ae9a9b9e9181a46485e1981d)
* FoundationDB: [Script](https://github.com/apple/foundationdb/blob/1a6114a66f3de508c0cf0a45f72f3687ba05750c/contrib/generate_profile.sh)
* Zstd: [Makefile](https://github.com/facebook/zstd/blob/dev/programs/Makefile#L232)
* [Foot](https://codeberg.org/dnkl/foot): [Scripts](https://codeberg.org/dnkl/foot/src/branch/master/pgo)
* Windows Terminal: [GitHub PR](https://github.com/microsoft/terminal/pull/10071)
* Pydantic-core: [GitHub PR](https://github.com/pydantic/pydantic-core/pull/741)
* file.d: [GitHub PR](https://github.com/ozontech/file.d/pull/469)
* OceanBase: [CMake flag](https://github.com/oceanbase/oceanbase/blob/master/cmake/Env.cmake#L55)

Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that. I don't know how much you care about performance in Sage so I don't know how important these improvements are for the project. I hope we can use these benchmarks at least as an additional data point about PGO efficiency for compilers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #70

Test environment

Benchmarks

Results

Further steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #70

Description

Test environment

Benchmarks

Results

Further steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions