optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473

l0rinc · 2024-02-24T12:52:07Z

To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7 and 9, fitting them into 64-bit long integers.
This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead.
The benchmarks indicate a ~4.4x speedup for Base58Encode:

make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000

before:

ns/byte	byte/s	err%	total	benchmark
61.00	16,393,083.32	0.2%	1.07	`Base58Encode`

after:

ns/byte	byte/s	err%	total	benchmark
13.79	72,502,616.20	0.1%	1.10	`Base58Encode`

The same was applied to decoding, resulting in a ~2x speedup:

make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000

before:

ns/byte	byte/s	err%	total	benchmark
17.41	57,429,723.99	0.1%	1.10	`Base58Decode`

after:

ns/byte	byte/s	err%	total	benchmark
8.80	113,590,754.53	0.1%	1.11	`Base58Decode`

This also speed up listunspent by >50%.

See #29473 (comment) for additional benchmarks and #30035 for additional tests.
I've also tested with with the following temp test (comparing the exact previous impl with the new one for random inputs: https://gist.github.com/paplorinc/96d8e355f6fef10ac79f62d89a6d9f19#file-base58_tests-cpp-L211-L249)

DrahtBot · 2024-02-24T12:52:09Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29473.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

No conflicts as of last run.

src/base58.cpp

DrahtBot · 2024-02-24T18:12:40Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/21941361780}

src/base58.cpp

DrahtBot · 2024-02-25T15:24:39Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/21955441135}

l0rinc · 2024-07-01T20:37:50Z

Ok, the reason for #7656 (comment) was an improvement in listunspent. Seems fine, if this is still the case. But this will need to be checked first.

I've created 10k blocks in regtest and measured the performance of listunspent before and after:

killall bitcoind 2>/dev/null || true
./src/bitcoind -regtest -daemon
./src/bitcoin-cli -regtest createwallet "test_wallet"
./src/bitcoin-cli -regtest generatetoaddress 10000 $(./src/bitcoin-cli -regtest getnewaddress) >/dev/null

hyperfine \
  --warmup 3 --runs 10 \
  --parameter-list BRANCH master,l0rinc/optimize-base58-conversion \
  --prepare '
    killall bitcoind 2>/dev/null || true
    git checkout "{BRANCH}" && git reset --hard
    make -j10 >/dev/null
    ./src/bitcoind -regtest -daemon
    sleep 1
    ./src/bitcoin-cli -regtest loadwallet "test_wallet"
  ' \
  './src/bitcoin-cli -regtest listunspent'

which resulted in:

Benchmark 1: ./src/bitcoin-cli -regtest listunspent (BRANCH = master)
  Time (mean ± σ):     378.7 ms ±   2.8 ms    [User: 71.7 ms, System: 13.9 ms]
  Range (min … max):   376.4 ms … 386.0 ms    10 runs
 
Benchmark 2: ./src/bitcoin-cli -regtest listunspent (BRANCH = l0rinc/optimize-base58-conversion)
  Time (mean ± σ):     244.0 ms ±   1.3 ms    [User: 71.4 ms, System: 13.7 ms]
  Range (min … max):   242.2 ms … 246.1 ms    10 runs
 
Summary
  ./src/bitcoin-cli -regtest listunspent (BRANCH = l0rinc/optimize-base58-conversion) ran
    1.55 ± 0.01 times faster than ./src/bitcoin-cli -regtest listunspent (BRANCH = master)

i.e. 55% faster listunspent for the given usecase.

maflcko · 2024-07-02T07:37:58Z

I presume, if you wanted to speed up listunspent in this case of a bip-32 base58 encoded descriptor string, a higher speed-up could be achieved by caching the descriptor string results, because the expectation is that there are generally only a few (or one) per wallet. But I am not sure how easy that is, or whether it is worth it.

l0rinc · 2024-07-02T07:42:50Z

I was following the benchmarks to decide what to work on, but since you've mentioned this is an important usecase, I measured it as well.
Is there anything else you'd like me to measure or change in this pull request?

maflcko · 2024-07-02T08:04:36Z

I don't think I said that "this is an important usecase". I simply said that the reason for the improvement in #7656 (comment) was listunspent. That pull request was created before bech32(m) addresses and descriptor wallets existed. Considering that bech32 is the default address type and that descriptor wallets are the default as well, the only expected remaining use of base58 should be in the bip-32 descriptor string. My guess is that, if it was important to optimize that for speed, a higher speedup could be achieved by caching the generated descriptor string.

However, this is just background information and a guess. I don't know if it is true, how easy it is to implement, whether it is worth it to review, or worth it at all.

l0rinc · 2024-07-12T11:48:35Z

@luke-jr, would this optimization be more welcome in https://github.com/bitcoin/libbase58 instead?

luke-jr · 2024-07-12T14:35:27Z

would this optimization be more welcome in https://github.com/bitcoin/libbase58 instead?

The algorithm used in libbase58 is different, so not sure this even applies. Kinda doubt it would be worth the time to port/review either, unless a library consumer cares about performance or the other criteria explained already here.

If you're just looking for things to do, extending libbase58 to Bech32 might be worth doing. See bitcoin/libbase58#6

Note that the constants for size allocations have slightly changed after recalculating them

Instead of collecting the values to and empty b58, we're cloning the input without leading zeros and dividing it in-place, while doing the quadratic division from the most significant, non-zero digit, forward. make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 59.34 | 16,851,250.88 | 0.2% | 1.10 | `Base58Encode` | 59.20 | 16,892,479.61 | 0.2% | 1.10 | `Base58Encode` | 58.97 | 16,958,226.76 | 0.2% | 1.10 | `Base58Encode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 40.80 | 24,508,402.06 | 0.1% | 1.10 | `Base58Encode` | 40.83 | 24,489,762.49 | 0.2% | 1.10 | `Base58Encode` | 39.71 | 25,182,310.62 | 0.2% | 1.10 | `Base58Encode` ``` and make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 79.93 | 12,511,576.75 | 0.4% | 1.10 | `Base58CheckEncode` | 79.49 | 12,580,136.21 | 0.1% | 1.10 | `Base58CheckEncode` | 79.65 | 12,554,785.16 | 0.1% | 1.10 | `Base58CheckEncode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 55.27 | 18,094,561.35 | 0.1% | 1.10 | `Base58CheckEncode` | 55.41 | 18,046,778.32 | 0.1% | 1.10 | `Base58CheckEncode` | 55.32 | 18,075,763.16 | 0.1% | 1.10 | `Base58CheckEncode` ``` Co-authored-by: optout21 <13562139+optout21@users.noreply.github.com>

To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7, fitting them into 64-bit long integers. This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead. The benchmarks indicate a 4x speedup for `Base58CheckEncode` and `Base58Encode`: make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 14.32 | 69,831,031.44 | 0.1% | 1.10 | `Base58Encode` | 14.32 | 69,811,995.18 | 0.1% | 1.10 | `Base58Encode` | 14.35 | 69,662,527.88 | 0.5% | 1.10 | `Base58Encode` ``` make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 19.15 | 52,231,968.11 | 0.1% | 1.06 | `Base58CheckEncode` | 19.13 | 52,269,345.54 | 0.4% | 1.05 | `Base58CheckEncode` | 19.14 | 52,244,117.61 | 0.3% | 1.06 | `Base58CheckEncode` ```

> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 17.50 | 57,141,622.65 | 0.1% | 1.10 | `Base58Decode` | 17.42 | 57,392,223.96 | 0.0% | 1.10 | `Base58Decode` | 17.43 | 57,358,655.44 | 0.2% | 1.10 | `Base58Decode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 16.37 | 61,093,842.90 | 0.2% | 1.10 | `Base58Decode` | 16.37 | 61,100,514.64 | 0.1% | 1.10 | `Base58Decode` | 16.38 | 61,046,275.93 | 0.1% | 1.10 | `Base58Decode` ```

> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 8.79 | 113,767,685.06 | 0.1% | 1.10 | `Base58Decode` | 8.78 | 113,831,528.33 | 0.0% | 1.10 | `Base58Decode` | 8.80 | 113,607,470.35 | 0.2% | 1.10 | `Base58Decode` ```

achow101 · 2024-11-04T23:05:16Z

As this is basically a complete rewrite of the base58 encoding and decoding code, I don't think the review effort required to review this is worth it relative to the unimportance of these functions.

l0rinc · 2024-11-05T10:16:46Z

Thanks for checking it out @achow101.
I could make the diff minimal, but given that it's not that important, I will leave it as is - until someone starts complaining that listunspent is slow :)

l0rinc · 2024-11-25T19:33:52Z

I just found this was already attempted in a very similar way by @martinus a few years ago in #21176 - so now we have two versions, if this will turn out to be important again.
Closing, thanks for the reviews!

l0rinc changed the title ~~WIP optimization: Unify size estimations in Base58 encoding/decoding~~ WIP optimization: Enhance Base58 conversion through preliminary Base56 conversion Feb 24, 2024

l0rinc changed the title ~~WIP optimization: Enhance Base58 conversion through preliminary Base56 conversion~~ WIP optimization: Enhance Base58 conversion through intermediary Base56 conversion Feb 24, 2024

l0rinc commented Feb 24, 2024

View reviewed changes

src/base58.cpp Outdated Show resolved Hide resolved

l0rinc changed the title ~~WIP optimization: Enhance Base58 conversion through intermediary Base56 conversion~~ optimization: Enhance Base58 conversion through intermediary Base56 conversion by 300% Feb 24, 2024

l0rinc changed the title ~~optimization: Enhance Base58 conversion through intermediary Base56 conversion by 300%~~ optimization: Speed up Base58 conversion through intermediary Base56 conversion by 300% Feb 24, 2024

l0rinc marked this pull request as ready for review February 24, 2024 18:02

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from afbc980 to 321456b Compare February 24, 2024 18:12

DrahtBot added the CI failed label Feb 24, 2024

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 3 times, most recently from 397ae64 to e168e62 Compare February 24, 2024 21:36

optout21 reviewed Feb 25, 2024

View reviewed changes

src/base58.cpp Outdated Show resolved Hide resolved

l0rinc marked this pull request as draft February 25, 2024 06:37

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 2 times, most recently from cab8a6e to b930cd5 Compare February 25, 2024 07:06

DrahtBot removed the CI failed label Feb 25, 2024

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 2 times, most recently from ba23707 to c9b351d Compare February 25, 2024 15:24

DrahtBot added the CI failed label Feb 25, 2024

DrahtBot removed the CI failed label Feb 25, 2024

l0rinc requested review from optout21 and Empact February 25, 2024 16:41

l0rinc marked this pull request as ready for review February 25, 2024 16:42

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from c9b351d to 83caaf7 Compare February 26, 2024 21:25

l0rinc changed the title ~~optimization: Speed up Base58 conversion through intermediary Base56 conversion by 300%~~ optimization: Speed up Base58 conversion through intermediary Base56 conversion by 400% Feb 26, 2024

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from 83caaf7 to 73845ec Compare February 26, 2024 21:41

Empact mentioned this pull request Feb 26, 2024

build: Enable -Wshadow #29485

Closed

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 5 times, most recently from 864e479 to 2dd3818 Compare July 2, 2024 06:58

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from 2dd3818 to 75a86de Compare July 12, 2024 13:19

DrahtBot added the CI failed label Jul 13, 2024

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from 75a86de to 7feb85b Compare July 14, 2024 12:38

l0rinc and others added 6 commits July 18, 2024 13:48

Unify size estimations in Base58 encoding/decoding

b42dd6c

Note that the constants for size allocations have slightly changed after recalculating them

Remove redundant zero-skipping logic in Base58 encode/decode

0c4bc5e

l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from 7feb85b to c1291fd Compare July 18, 2024 11:48

DrahtBot removed the CI failed label Jul 18, 2024

achow101 requested review from achow101 and removed request for optout21 October 15, 2024 15:09

l0rinc mentioned this pull request Oct 31, 2024

Extend Benchmarking to Additional RPC Endpoints and Throughput Scenarios bitcoin-dev-tools/benchooor#1

Open

l0rinc closed this Nov 25, 2024

l0rinc mentioned this pull request Dec 31, 2024

RPC: Add reserve member function to UniValue and use it in blockToJSON function #31179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473

optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473

Uh oh!

l0rinc commented Feb 24, 2024 •

edited

Loading

Uh oh!

DrahtBot commented Feb 24, 2024 •

edited

Loading

Uh oh!

Uh oh!

DrahtBot commented Feb 24, 2024

Uh oh!

Uh oh!

DrahtBot commented Feb 25, 2024

Uh oh!

l0rinc commented Jul 1, 2024 •

edited

Loading

Uh oh!

maflcko commented Jul 2, 2024

Uh oh!

l0rinc commented Jul 2, 2024

Uh oh!

maflcko commented Jul 2, 2024 •

edited

Loading

Uh oh!

l0rinc commented Jul 12, 2024

Uh oh!

luke-jr commented Jul 12, 2024

Uh oh!

achow101 commented Nov 4, 2024

Uh oh!

l0rinc commented Nov 5, 2024

Uh oh!

l0rinc commented Nov 25, 2024

Uh oh!

Uh oh!

optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473

optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473

Uh oh!

Conversation

l0rinc commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

Uh oh!

DrahtBot commented Feb 24, 2024

Uh oh!

Uh oh!

DrahtBot commented Feb 25, 2024

Uh oh!

l0rinc commented Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Jul 2, 2024

Uh oh!

l0rinc commented Jul 2, 2024

Uh oh!

maflcko commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Jul 12, 2024

Uh oh!

luke-jr commented Jul 12, 2024

Uh oh!

achow101 commented Nov 4, 2024

Uh oh!

l0rinc commented Nov 5, 2024

Uh oh!

l0rinc commented Nov 25, 2024

Uh oh!

Uh oh!

l0rinc commented Feb 24, 2024 •

edited

Loading

DrahtBot commented Feb 24, 2024 •

edited

Loading

l0rinc commented Jul 1, 2024 •

edited

Loading

maflcko commented Jul 2, 2024 •

edited

Loading