-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Closed
Description
Is it worth revisiting LTO compilation?
I did some experimentation with LTO compilation and the results look promising :-)
Binary size results (non-stripped binaries):
bench_bitcoin
shrank from 74 678 800 to 39 695 288 bytes (-47 %)bitcoin-cli
shrank from 4 837 744 to 2 918 544 bytes (-40 %)bitcoin-tx
shrank from 15 206 720 to 7 717 608 bytes (-49 %)bitcoind
shrank from 102 004 960 to 70 706 000 bytes (-31 %)test_bitcoin
shrank from 161 739 656 to 100 838 072 bytes (-38 %)test_bitcoin_fuzzy
shrank from 15 929 968 to 6 036 176 bytes (-62 %)
Binary size results (stripped binaries):
bench_bitcoin
shrank from 5 632 272 to 3 722 720 bytes (-34 %)bitcoin-cli
shrank from 383 216 to 260 288 bytes (-32 %)bitcoin-tx
shrank from 1 399 112 to 936 080 bytes (-33 %)bitcoind
shrank from 6 639 336 to 6 044 520 bytes (-9 %)test_bitcoin
shrank from 12 067 056 to 10 853 616 bytes (-10 %)test_bitcoin_fuzzy
shrank from 1 468 976 to 428 160 bytes (-71 %)
Benchmark results (insignificant relative changes omitted to reduce noise):
- Runtime of benchmark
FastRandom_1bit
changed -7.9 % when enabling LTO - Runtime of benchmark
FastRandom_32bit
changed -6.7 % when enabling LTO - Runtime of benchmark
MatchGCSFilter
changed -11.5 % when enabling LTO - Runtime of benchmark
MempoolEviction
changed -13.3 % when enabling LTO - Runtime of benchmark
PrevectorDeserializeNontrivial
changed -58.1 % when enabling LTO - Runtime of benchmark
RollingBloom
changed -15.0 % when enabling LTO
Below is the log from my experimentation.
Let me know if anything can be improved. Feedback appreciated.
# Build Bitcoin without LTO (baseline)
$ git clone https://github.com/bitcoin/bitcoin bitcoin-without-lto
$ cd bitcoin-without-lto
$ export CC="clang"
$ export CXX="clang++"
$ export RANLIB="/usr/lib/llvm-6.0/bin/llvm-ranlib"
$ ./autogen.sh
$ ./configure
$ make
$ cd ..
# Build Bitcoin with LTO
$ git clone https://github.com/bitcoin/bitcoin bitcoin-with-lto
$ cd bitcoin-with-lto
$ PREFIX=${PWD}/binutils-bin/
$ mkdir binutils-bin
$ apt install texinfo bison
$ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils
$ mkdir binutils-build
$ cd binutils-build
$ export CC="clang"
$ export CXX="clang++"
$ unset RANLIB
$ ../binutils/configure --enable-gold --enable-plugins --disable-werror --prefix=${PREFIX}
$ make all-gold
$ make install
$ cd ..
$ ${PREFIX}/bin/ld.gold -plugin 2>&1 | grep -q "plugin: missing argument" && echo "ld.gold has plugin support" || echo "ERROR: ld.gold lacks plugin support"
$ cp /usr/lib/llvm-6.0/lib/LLVMgold.so ${PREFIX}/lib/
$ export PATH="${PREFIX}/bin:${PATH}"
$ export CC="clang -flto"
$ export CXX="clang++ -flto"
$ export RANLIB="/usr/lib/llvm-6.0/bin/llvm-ranlib"
$ ./autogen.sh
$ ./configure
$ make
$ cd ..
# Check binary sizes
$ ls -Sl bitcoin-*-lto/src/bitcoind \
bitcoin-*-lto/src/bitcoin-tx \
bitcoin-*-lto/src/bench/bench_bitcoin \
bitcoin-*-lto/src/bitcoin-cli \
bitcoin-*-lto/src/test/test_bitcoin \
bitcoin-*-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 161739656 Sep 20 11:57 bitcoin-without-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root 102004960 Sep 20 11:57 bitcoin-without-lto/src/bitcoind
-rwxr-xr-x 1 root root 100838072 Sep 20 12:12 bitcoin-with-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root 74678800 Sep 20 11:57 bitcoin-without-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root 70706000 Sep 20 12:11 bitcoin-with-lto/src/bitcoind
-rwxr-xr-x 1 root root 39695288 Sep 20 12:10 bitcoin-with-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root 15929968 Sep 20 11:57 bitcoin-without-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 15206720 Sep 20 11:57 bitcoin-without-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root 7717608 Sep 20 12:09 bitcoin-with-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root 6036176 Sep 20 12:09 bitcoin-with-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 4837744 Sep 20 11:57 bitcoin-without-lto/src/bitcoin-cli
-rwxr-xr-x 1 root root 2918544 Sep 20 12:08 bitcoin-with-lto/src/bitcoin-cli
$ strip bitcoin-*-lto/src/bitcoind \
bitcoin-*-lto/src/bitcoin-tx \
bitcoin-*-lto/src/bench/bench_bitcoin \
bitcoin-*-lto/src/bitcoin-cli \
bitcoin-*-lto/src/test/test_bitcoin \
bitcoin-*-lto/src/test/test_bitcoin_fuzzy
$ ls -Sl bitcoin-*-lto/src/bitcoind \
bitcoin-*-lto/src/bitcoin-tx \
bitcoin-*-lto/src/bench/bench_bitcoin \
bitcoin-*-lto/src/bitcoin-cli \
bitcoin-*-lto/src/test/test_bitcoin \
bitcoin-*-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 12067056 Sep 20 15:54 bitcoin-without-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root 10853616 Sep 20 15:54 bitcoin-with-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root 6639336 Sep 20 15:54 bitcoin-without-lto/src/bitcoind
-rwxr-xr-x 1 root root 6044520 Sep 20 15:54 bitcoin-with-lto/src/bitcoind
-rwxr-xr-x 1 root root 5632272 Sep 20 15:54 bitcoin-without-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root 3722720 Sep 20 15:54 bitcoin-with-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root 1468976 Sep 20 15:54 bitcoin-without-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 1399112 Sep 20 15:54 bitcoin-without-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root 936080 Sep 20 15:54 bitcoin-with-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root 428160 Sep 20 15:54 bitcoin-with-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 383216 Sep 20 15:54 bitcoin-without-lto/src/bitcoin-cli
-rwxr-xr-x 1 root root 260288 Sep 20 15:54 bitcoin-with-lto/src/bitcoin-cli
# Gather performance measurements until ^C is pressed
$ while true; do for SWITCH in with without; do echo "# $SWITCH"; \
bitcoin-${SWITCH}-lto/src/bench/bench_bitcoin; done; done 2>&1 | \
tee bench_bitcoin-lto-vs-non-lto
# Summarize results
$ ./parse_lto.py < bench_bitcoin-lto-vs-non-lto
* Runtime of benchmark FastRandom_1bit changed -7.9 % when enabling LTO. Median total time was 4.4 seconds without LTO and 4.1 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark FastRandom_32bit changed -6.7 % when enabling LTO. Median total time was 5.8 seconds without LTO and 5.4 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark MatchGCSFilter changed -11.5 % when enabling LTO. Median total time was 8.3 seconds without LTO and 7.3 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark MempoolEviction changed -13.3 % when enabling LTO. Median total time was 4.6 seconds without LTO and 4.0 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark PrevectorDeserializeNontrivial changed -58.1 % when enabling LTO. Median total time was 8.2 seconds without LTO and 3.4 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark RollingBloom changed -15.0 % when enabling LTO. Median total time was 4.4 seconds without LTO and 3.7 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
# Environment
$ clang++ --version | head -2
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
$ dpkg -S $(which clang++)
clang: /usr/bin/clang++
$ dpkg -S /usr/lib/llvm-6.0/bin/llvm-ranlib
llvm-6.0: /usr/lib/llvm-6.0/bin/llvm-ranlib
$ dpkg -S /usr/lib/llvm-6.0/lib/LLVMgold.so
llvm-6.0-dev: /usr/lib/llvm-6.0/lib/LLVMgold.so
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
This is the content of parse_lto.py
:
#!/usr/bin/env python3
import collections
import statistics
import sys
results_lto = collections.defaultdict(list)
results_nonlto = collections.defaultdict(list)
for line in sys.stdin:
line = line.rstrip("\n")
if line.startswith("# Benchmark"):
continue
if line.startswith("#"):
lto_status = line[2:]
continue
assert(lto_status in ["with", "without"])
benchmark, _, _, total_time, _ = line.split(", ", 4)
total_time = float(total_time)
if lto_status == "with":
results_lto[benchmark].append(total_time)
continue
if lto_status == "without":
results_nonlto[benchmark].append(total_time)
continue
assert(False)
assert(len(results_lto) == len(results_nonlto))
for benchmark in sorted(results_lto):
least_observations = min(len(results_lto[benchmark]), len(results_nonlto[benchmark]))
results_lto[benchmark] = results_lto[benchmark][:least_observations]
results_nonlto[benchmark] = results_nonlto[benchmark][:least_observations]
for benchmark in sorted(results_lto):
assert(len(results_lto[benchmark]) == len(results_nonlto[benchmark]))
median_lto = statistics.median(results_lto[benchmark])
median_nonlto = statistics.median(results_nonlto[benchmark])
assert(median_nonlto != 0)
change = median_lto / median_nonlto - 1
if abs(change) < 0.05:
continue
print("* Runtime of benchmark {} changed {:.1f} % when enabling LTO. Median total time was {:.1f} seconds without LTO and {:.1f} seconds with LTO. Based on {} independent runs of bench_bitcoin.".format(
benchmark, 100 * change, median_nonlto, median_lto, len(results_lto[benchmark])
))