-
Notifications
You must be signed in to change notification settings - Fork 37.7k
optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29473. ReviewsSee the guideline for information on the review process. ConflictsNo conflicts as of last run. |
afbc980
to
321456b
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
397ae64
to
e168e62
Compare
cab8a6e
to
b930cd5
Compare
ba23707
to
c9b351d
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
c9b351d
to
83caaf7
Compare
83caaf7
to
73845ec
Compare
I've created 10k blocks in regtest and measured the performance of killall bitcoind 2>/dev/null || true
./src/bitcoind -regtest -daemon
./src/bitcoin-cli -regtest createwallet "test_wallet"
./src/bitcoin-cli -regtest generatetoaddress 10000 $(./src/bitcoin-cli -regtest getnewaddress) >/dev/null
hyperfine \
--warmup 3 --runs 10 \
--parameter-list BRANCH master,l0rinc/optimize-base58-conversion \
--prepare '
killall bitcoind 2>/dev/null || true
git checkout "{BRANCH}" && git reset --hard
make -j10 >/dev/null
./src/bitcoind -regtest -daemon
sleep 1
./src/bitcoin-cli -regtest loadwallet "test_wallet"
' \
'./src/bitcoin-cli -regtest listunspent' which resulted in: Benchmark 1: ./src/bitcoin-cli -regtest listunspent (BRANCH = master)
Time (mean ± σ): 378.7 ms ± 2.8 ms [User: 71.7 ms, System: 13.9 ms]
Range (min … max): 376.4 ms … 386.0 ms 10 runs
Benchmark 2: ./src/bitcoin-cli -regtest listunspent (BRANCH = l0rinc/optimize-base58-conversion)
Time (mean ± σ): 244.0 ms ± 1.3 ms [User: 71.4 ms, System: 13.7 ms]
Range (min … max): 242.2 ms … 246.1 ms 10 runs
Summary
./src/bitcoin-cli -regtest listunspent (BRANCH = l0rinc/optimize-base58-conversion) ran
1.55 ± 0.01 times faster than ./src/bitcoin-cli -regtest listunspent (BRANCH = master) i.e. 55% faster |
864e479
to
2dd3818
Compare
I presume, if you wanted to speed up |
I was following the benchmarks to decide what to work on, but since you've mentioned this is an important usecase, I measured it as well. |
I don't think I said that "this is an important usecase". I simply said that the reason for the improvement in #7656 (comment) was However, this is just background information and a guess. I don't know if it is true, how easy it is to implement, whether it is worth it to review, or worth it at all. |
@luke-jr, would this optimization be more welcome in https://github.com/bitcoin/libbase58 instead? |
2dd3818
to
75a86de
Compare
The algorithm used in libbase58 is different, so not sure this even applies. Kinda doubt it would be worth the time to port/review either, unless a library consumer cares about performance or the other criteria explained already here. If you're just looking for things to do, extending libbase58 to Bech32 might be worth doing. See bitcoin/libbase58#6 |
75a86de
to
7feb85b
Compare
Note that the constants for size allocations have slightly changed after recalculating them
Instead of collecting the values to and empty b58, we're cloning the input without leading zeros and dividing it in-place, while doing the quadratic division from the most significant, non-zero digit, forward. make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 59.34 | 16,851,250.88 | 0.2% | 1.10 | `Base58Encode` | 59.20 | 16,892,479.61 | 0.2% | 1.10 | `Base58Encode` | 58.97 | 16,958,226.76 | 0.2% | 1.10 | `Base58Encode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 40.80 | 24,508,402.06 | 0.1% | 1.10 | `Base58Encode` | 40.83 | 24,489,762.49 | 0.2% | 1.10 | `Base58Encode` | 39.71 | 25,182,310.62 | 0.2% | 1.10 | `Base58Encode` ``` and make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 79.93 | 12,511,576.75 | 0.4% | 1.10 | `Base58CheckEncode` | 79.49 | 12,580,136.21 | 0.1% | 1.10 | `Base58CheckEncode` | 79.65 | 12,554,785.16 | 0.1% | 1.10 | `Base58CheckEncode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 55.27 | 18,094,561.35 | 0.1% | 1.10 | `Base58CheckEncode` | 55.41 | 18,046,778.32 | 0.1% | 1.10 | `Base58CheckEncode` | 55.32 | 18,075,763.16 | 0.1% | 1.10 | `Base58CheckEncode` ``` Co-authored-by: optout21 <13562139+optout21@users.noreply.github.com>
To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7, fitting them into 64-bit long integers. This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead. The benchmarks indicate a 4x speedup for `Base58CheckEncode` and `Base58Encode`: make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 14.32 | 69,831,031.44 | 0.1% | 1.10 | `Base58Encode` | 14.32 | 69,811,995.18 | 0.1% | 1.10 | `Base58Encode` | 14.35 | 69,662,527.88 | 0.5% | 1.10 | `Base58Encode` ``` make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 19.15 | 52,231,968.11 | 0.1% | 1.06 | `Base58CheckEncode` | 19.13 | 52,269,345.54 | 0.4% | 1.05 | `Base58CheckEncode` | 19.14 | 52,244,117.61 | 0.3% | 1.06 | `Base58CheckEncode` ```
> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 17.50 | 57,141,622.65 | 0.1% | 1.10 | `Base58Decode` | 17.42 | 57,392,223.96 | 0.0% | 1.10 | `Base58Decode` | 17.43 | 57,358,655.44 | 0.2% | 1.10 | `Base58Decode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 16.37 | 61,093,842.90 | 0.2% | 1.10 | `Base58Decode` | 16.37 | 61,100,514.64 | 0.1% | 1.10 | `Base58Decode` | 16.38 | 61,046,275.93 | 0.1% | 1.10 | `Base58Decode` ```
> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 8.79 | 113,767,685.06 | 0.1% | 1.10 | `Base58Decode` | 8.78 | 113,831,528.33 | 0.0% | 1.10 | `Base58Decode` | 8.80 | 113,607,470.35 | 0.2% | 1.10 | `Base58Decode` ```
7feb85b
to
c1291fd
Compare
As this is basically a complete rewrite of the base58 encoding and decoding code, I don't think the review effort required to review this is worth it relative to the unimportance of these functions. |
Thanks for checking it out @achow101. |
To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7 and 9, fitting them into 64-bit long integers.
This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead.
The benchmarks indicate a ~4.4x speedup for
Base58Encode
:before:
Base58Encode
after:
Base58Encode
The same was applied to decoding, resulting in a ~2x speedup:
before:
Base58Decode
after:
Base58Decode
This also speed up listunspent by >50%.
See #29473 (comment) for additional benchmarks and #30035 for additional tests.
I've also tested with with the following temp test (comparing the exact previous impl with the new one for random inputs: https://gist.github.com/paplorinc/96d8e355f6fef10ac79f62d89a6d9f19#file-base58_tests-cpp-L211-L249)