Skip to content

Revisit link-time optimization (LTO)? Some results from clang LTO compilation #14277

@practicalswift

Description

@practicalswift

Is it worth revisiting LTO compilation?

I did some experimentation with LTO compilation and the results look promising :-)

Binary size results (non-stripped binaries):

  • bench_bitcoin shrank from 74 678 800 to 39 695 288 bytes (-47 %)
  • bitcoin-cli shrank from 4 837 744 to 2 918 544 bytes (-40 %)
  • bitcoin-tx shrank from 15 206 720 to 7 717 608 bytes (-49 %)
  • bitcoind shrank from 102 004 960 to 70 706 000 bytes (-31 %)
  • test_bitcoin shrank from 161 739 656 to 100 838 072 bytes (-38 %)
  • test_bitcoin_fuzzy shrank from 15 929 968 to 6 036 176 bytes (-62 %)

Binary size results (stripped binaries):

  • bench_bitcoin shrank from 5 632 272 to 3 722 720 bytes (-34 %)
  • bitcoin-cli shrank from 383 216 to 260 288 bytes (-32 %)
  • bitcoin-tx shrank from 1 399 112 to 936 080 bytes (-33 %)
  • bitcoind shrank from 6 639 336 to 6 044 520 bytes (-9 %)
  • test_bitcoin shrank from 12 067 056 to 10 853 616 bytes (-10 %)
  • test_bitcoin_fuzzy shrank from 1 468 976 to 428 160 bytes (-71 %)

Benchmark results (insignificant relative changes omitted to reduce noise):

  • Runtime of benchmark FastRandom_1bit changed -7.9 % when enabling LTO
  • Runtime of benchmark FastRandom_32bit changed -6.7 % when enabling LTO
  • Runtime of benchmark MatchGCSFilter changed -11.5 % when enabling LTO
  • Runtime of benchmark MempoolEviction changed -13.3 % when enabling LTO
  • Runtime of benchmark PrevectorDeserializeNontrivial changed -58.1 % when enabling LTO
  • Runtime of benchmark RollingBloom changed -15.0 % when enabling LTO

Below is the log from my experimentation.

Let me know if anything can be improved. Feedback appreciated.

# Build Bitcoin without LTO (baseline)
$ git clone https://github.com/bitcoin/bitcoin bitcoin-without-lto
$ cd bitcoin-without-lto
$ export CC="clang"
$ export CXX="clang++"
$ export RANLIB="/usr/lib/llvm-6.0/bin/llvm-ranlib"
$ ./autogen.sh
$ ./configure
$ make
$ cd ..

# Build Bitcoin with LTO
$ git clone https://github.com/bitcoin/bitcoin bitcoin-with-lto
$ cd bitcoin-with-lto
$ PREFIX=${PWD}/binutils-bin/
$ mkdir binutils-bin
$ apt install texinfo bison
$ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils
$ mkdir binutils-build
$ cd binutils-build
$ export CC="clang"
$ export CXX="clang++"
$ unset RANLIB
$ ../binutils/configure --enable-gold --enable-plugins --disable-werror --prefix=${PREFIX}
$ make all-gold
$ make install
$ cd ..
$ ${PREFIX}/bin/ld.gold -plugin 2>&1 | grep -q "plugin: missing argument" && echo "ld.gold has plugin support" || echo "ERROR: ld.gold lacks plugin support"
$ cp /usr/lib/llvm-6.0/lib/LLVMgold.so ${PREFIX}/lib/
$ export PATH="${PREFIX}/bin:${PATH}"
$ export CC="clang -flto"
$ export CXX="clang++ -flto"
$ export RANLIB="/usr/lib/llvm-6.0/bin/llvm-ranlib"
$ ./autogen.sh
$ ./configure
$ make
$ cd ..

# Check binary sizes
$ ls -Sl bitcoin-*-lto/src/bitcoind \
       bitcoin-*-lto/src/bitcoin-tx \
       bitcoin-*-lto/src/bench/bench_bitcoin \
       bitcoin-*-lto/src/bitcoin-cli \
       bitcoin-*-lto/src/test/test_bitcoin \
       bitcoin-*-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 161739656 Sep 20 11:57 bitcoin-without-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root 102004960 Sep 20 11:57 bitcoin-without-lto/src/bitcoind
-rwxr-xr-x 1 root root 100838072 Sep 20 12:12 bitcoin-with-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root  74678800 Sep 20 11:57 bitcoin-without-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root  70706000 Sep 20 12:11 bitcoin-with-lto/src/bitcoind
-rwxr-xr-x 1 root root  39695288 Sep 20 12:10 bitcoin-with-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root  15929968 Sep 20 11:57 bitcoin-without-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root  15206720 Sep 20 11:57 bitcoin-without-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root   7717608 Sep 20 12:09 bitcoin-with-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root   6036176 Sep 20 12:09 bitcoin-with-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root   4837744 Sep 20 11:57 bitcoin-without-lto/src/bitcoin-cli
-rwxr-xr-x 1 root root   2918544 Sep 20 12:08 bitcoin-with-lto/src/bitcoin-cli
$ strip bitcoin-*-lto/src/bitcoind \
       bitcoin-*-lto/src/bitcoin-tx \
       bitcoin-*-lto/src/bench/bench_bitcoin \
       bitcoin-*-lto/src/bitcoin-cli \
       bitcoin-*-lto/src/test/test_bitcoin \
       bitcoin-*-lto/src/test/test_bitcoin_fuzzy
$ ls -Sl bitcoin-*-lto/src/bitcoind \
       bitcoin-*-lto/src/bitcoin-tx \
       bitcoin-*-lto/src/bench/bench_bitcoin \
       bitcoin-*-lto/src/bitcoin-cli \
       bitcoin-*-lto/src/test/test_bitcoin \
       bitcoin-*-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root 12067056 Sep 20 15:54 bitcoin-without-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root 10853616 Sep 20 15:54 bitcoin-with-lto/src/test/test_bitcoin
-rwxr-xr-x 1 root root  6639336 Sep 20 15:54 bitcoin-without-lto/src/bitcoind
-rwxr-xr-x 1 root root  6044520 Sep 20 15:54 bitcoin-with-lto/src/bitcoind
-rwxr-xr-x 1 root root  5632272 Sep 20 15:54 bitcoin-without-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root  3722720 Sep 20 15:54 bitcoin-with-lto/src/bench/bench_bitcoin
-rwxr-xr-x 1 root root  1468976 Sep 20 15:54 bitcoin-without-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root  1399112 Sep 20 15:54 bitcoin-without-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root   936080 Sep 20 15:54 bitcoin-with-lto/src/bitcoin-tx
-rwxr-xr-x 1 root root   428160 Sep 20 15:54 bitcoin-with-lto/src/test/test_bitcoin_fuzzy
-rwxr-xr-x 1 root root   383216 Sep 20 15:54 bitcoin-without-lto/src/bitcoin-cli
-rwxr-xr-x 1 root root   260288 Sep 20 15:54 bitcoin-with-lto/src/bitcoin-cli

# Gather performance measurements until ^C is pressed
$ while true; do for SWITCH in with without; do echo "# $SWITCH"; \
    bitcoin-${SWITCH}-lto/src/bench/bench_bitcoin; done; done 2>&1 | \
    tee bench_bitcoin-lto-vs-non-lto

# Summarize results
$ ./parse_lto.py < bench_bitcoin-lto-vs-non-lto
* Runtime of benchmark FastRandom_1bit changed -7.9 % when enabling LTO. Median total time was 4.4 seconds without LTO and 4.1 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark FastRandom_32bit changed -6.7 % when enabling LTO. Median total time was 5.8 seconds without LTO and 5.4 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark MatchGCSFilter changed -11.5 % when enabling LTO. Median total time was 8.3 seconds without LTO and 7.3 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark MempoolEviction changed -13.3 % when enabling LTO. Median total time was 4.6 seconds without LTO and 4.0 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark PrevectorDeserializeNontrivial changed -58.1 % when enabling LTO. Median total time was 8.2 seconds without LTO and 3.4 seconds with LTO. Based on 14 independent runs of bench_bitcoin.
* Runtime of benchmark RollingBloom changed -15.0 % when enabling LTO. Median total time was 4.4 seconds without LTO and 3.7 seconds with LTO. Based on 14 independent runs of bench_bitcoin.

# Environment
$ clang++ --version | head -2
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
$ dpkg -S $(which clang++)
clang: /usr/bin/clang++
$ dpkg -S /usr/lib/llvm-6.0/bin/llvm-ranlib
llvm-6.0: /usr/lib/llvm-6.0/bin/llvm-ranlib
$ dpkg -S /usr/lib/llvm-6.0/lib/LLVMgold.so
llvm-6.0-dev: /usr/lib/llvm-6.0/lib/LLVMgold.so
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"

This is the content of parse_lto.py:

#!/usr/bin/env python3

import collections
import statistics
import sys

results_lto = collections.defaultdict(list)
results_nonlto = collections.defaultdict(list)
for line in sys.stdin:
    line = line.rstrip("\n")
    if line.startswith("# Benchmark"):
        continue
    if line.startswith("#"):
        lto_status = line[2:]
        continue
    assert(lto_status in ["with", "without"])
    benchmark, _, _, total_time, _ = line.split(", ", 4)
    total_time = float(total_time)
    if lto_status == "with":
        results_lto[benchmark].append(total_time)
        continue
    if lto_status == "without":
        results_nonlto[benchmark].append(total_time)
        continue
    assert(False)

assert(len(results_lto) == len(results_nonlto))
for benchmark in sorted(results_lto):
    least_observations = min(len(results_lto[benchmark]), len(results_nonlto[benchmark]))
    results_lto[benchmark] = results_lto[benchmark][:least_observations]
    results_nonlto[benchmark] = results_nonlto[benchmark][:least_observations]
for benchmark in sorted(results_lto):
    assert(len(results_lto[benchmark]) == len(results_nonlto[benchmark]))
    median_lto = statistics.median(results_lto[benchmark])
    median_nonlto = statistics.median(results_nonlto[benchmark])
    assert(median_nonlto != 0)
    change = median_lto / median_nonlto - 1
    if abs(change) < 0.05:
        continue
    print("* Runtime of benchmark {} changed {:.1f} % when enabling LTO. Median total time was {:.1f} seconds without LTO and {:.1f} seconds with LTO. Based on {} independent runs of bench_bitcoin.".format(
        benchmark, 100 * change, median_nonlto, median_lto, len(results_lto[benchmark])
    ))

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions