Skip to content

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Feb 24, 2024

To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7 and 9, fitting them into 64-bit long integers.
This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead.
The benchmarks indicate a ~4.4x speedup for Base58Encode:

make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000

before:

ns/byte byte/s err% total benchmark
61.00 16,393,083.32 0.2% 1.07 Base58Encode

after:

ns/byte byte/s err% total benchmark
13.79 72,502,616.20 0.1% 1.10 Base58Encode

The same was applied to decoding, resulting in a ~2x speedup:

make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000

before:

ns/byte byte/s err% total benchmark
17.41 57,429,723.99 0.1% 1.10 Base58Decode

after:

ns/byte byte/s err% total benchmark
8.80 113,590,754.53 0.1% 1.11 Base58Decode

This also speed up listunspent by >50%.


See #29473 (comment) for additional benchmarks and #30035 for additional tests.
I've also tested with with the following temp test (comparing the exact previous impl with the new one for random inputs: https://gist.github.com/paplorinc/96d8e355f6fef10ac79f62d89a6d9f19#file-base58_tests-cpp-L211-L249)

@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 24, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29473.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

No conflicts as of last run.

@l0rinc l0rinc changed the title WIP optimization: Unify size estimations in Base58 encoding/decoding WIP optimization: Enhance Base58 conversion through preliminary Base56 conversion Feb 24, 2024
@l0rinc l0rinc changed the title WIP optimization: Enhance Base58 conversion through preliminary Base56 conversion WIP optimization: Enhance Base58 conversion through intermediary Base56 conversion Feb 24, 2024
@l0rinc l0rinc changed the title WIP optimization: Enhance Base58 conversion through intermediary Base56 conversion optimization: Enhance Base58 conversion through intermediary Base56 conversion by 300% Feb 24, 2024
@l0rinc l0rinc changed the title optimization: Enhance Base58 conversion through intermediary Base56 conversion by 300% optimization: Speed up Base58 conversion through intermediary Base56 conversion by 300% Feb 24, 2024
@l0rinc l0rinc marked this pull request as ready for review February 24, 2024 18:02
@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from afbc980 to 321456b Compare February 24, 2024 18:12
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/21941361780

@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 3 times, most recently from 397ae64 to e168e62 Compare February 24, 2024 21:36
@l0rinc l0rinc marked this pull request as draft February 25, 2024 06:37
@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 2 times, most recently from cab8a6e to b930cd5 Compare February 25, 2024 07:06
@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 2 times, most recently from ba23707 to c9b351d Compare February 25, 2024 15:24
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/21955441135

@l0rinc l0rinc requested review from optout21 and Empact February 25, 2024 16:41
@l0rinc l0rinc marked this pull request as ready for review February 25, 2024 16:42
@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from c9b351d to 83caaf7 Compare February 26, 2024 21:25
@l0rinc l0rinc changed the title optimization: Speed up Base58 conversion through intermediary Base56 conversion by 300% optimization: Speed up Base58 conversion through intermediary Base56 conversion by 400% Feb 26, 2024
@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from 83caaf7 to 73845ec Compare February 26, 2024 21:41
@Empact Empact mentioned this pull request Feb 26, 2024
@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 1, 2024

Ok, the reason for #7656 (comment) was an improvement in listunspent. Seems fine, if this is still the case. But this will need to be checked first.

I've created 10k blocks in regtest and measured the performance of listunspent before and after:

killall bitcoind 2>/dev/null || true
./src/bitcoind -regtest -daemon
./src/bitcoin-cli -regtest createwallet "test_wallet"
./src/bitcoin-cli -regtest generatetoaddress 10000 $(./src/bitcoin-cli -regtest getnewaddress) >/dev/null

hyperfine \
  --warmup 3 --runs 10 \
  --parameter-list BRANCH master,l0rinc/optimize-base58-conversion \
  --prepare '
    killall bitcoind 2>/dev/null || true
    git checkout "{BRANCH}" && git reset --hard
    make -j10 >/dev/null
    ./src/bitcoind -regtest -daemon
    sleep 1
    ./src/bitcoin-cli -regtest loadwallet "test_wallet"
  ' \
  './src/bitcoin-cli -regtest listunspent'

which resulted in:

Benchmark 1: ./src/bitcoin-cli -regtest listunspent (BRANCH = master)
  Time (mean ± σ):     378.7 ms ±   2.8 ms    [User: 71.7 ms, System: 13.9 ms]
  Range (min … max):   376.4 ms … 386.0 ms    10 runs
 
Benchmark 2: ./src/bitcoin-cli -regtest listunspent (BRANCH = l0rinc/optimize-base58-conversion)
  Time (mean ± σ):     244.0 ms ±   1.3 ms    [User: 71.4 ms, System: 13.7 ms]
  Range (min … max):   242.2 ms … 246.1 ms    10 runs
 
Summary
  ./src/bitcoin-cli -regtest listunspent (BRANCH = l0rinc/optimize-base58-conversion) ran
    1.55 ± 0.01 times faster than ./src/bitcoin-cli -regtest listunspent (BRANCH = master)

i.e. 55% faster listunspent for the given usecase.

@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch 5 times, most recently from 864e479 to 2dd3818 Compare July 2, 2024 06:58
@maflcko
Copy link
Member

maflcko commented Jul 2, 2024

I presume, if you wanted to speed up listunspent in this case of a bip-32 base58 encoded descriptor string, a higher speed-up could be achieved by caching the descriptor string results, because the expectation is that there are generally only a few (or one) per wallet. But I am not sure how easy that is, or whether it is worth it.

@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 2, 2024

I was following the benchmarks to decide what to work on, but since you've mentioned this is an important usecase, I measured it as well.
Is there anything else you'd like me to measure or change in this pull request?

@maflcko
Copy link
Member

maflcko commented Jul 2, 2024

I don't think I said that "this is an important usecase". I simply said that the reason for the improvement in #7656 (comment) was listunspent. That pull request was created before bech32(m) addresses and descriptor wallets existed. Considering that bech32 is the default address type and that descriptor wallets are the default as well, the only expected remaining use of base58 should be in the bip-32 descriptor string. My guess is that, if it was important to optimize that for speed, a higher speedup could be achieved by caching the generated descriptor string.

However, this is just background information and a guess. I don't know if it is true, how easy it is to implement, whether it is worth it to review, or worth it at all.

@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 12, 2024

@luke-jr, would this optimization be more welcome in https://github.com/bitcoin/libbase58 instead?

@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from 2dd3818 to 75a86de Compare July 12, 2024 13:19
@luke-jr
Copy link
Member

luke-jr commented Jul 12, 2024

would this optimization be more welcome in https://github.com/bitcoin/libbase58 instead?

The algorithm used in libbase58 is different, so not sure this even applies. Kinda doubt it would be worth the time to port/review either, unless a library consumer cares about performance or the other criteria explained already here.

If you're just looking for things to do, extending libbase58 to Bech32 might be worth doing. See bitcoin/libbase58#6

@l0rinc l0rinc force-pushed the paplorinc/optimize-base58-conversion branch from 75a86de to 7feb85b Compare July 14, 2024 12:38
l0rinc and others added 6 commits July 18, 2024 13:48
Note that the constants for size allocations have slightly changed after recalculating them
Instead of collecting the values to and empty b58, we're cloning the input without leading zeros and dividing it in-place, while doing the quadratic division from the most significant, non-zero digit, forward.

make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000
Before:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               59.34 |       16,851,250.88 |    0.2% |      1.10 | `Base58Encode`
|               59.20 |       16,892,479.61 |    0.2% |      1.10 | `Base58Encode`
|               58.97 |       16,958,226.76 |    0.2% |      1.10 | `Base58Encode`
```
After:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               40.80 |       24,508,402.06 |    0.1% |      1.10 | `Base58Encode`
|               40.83 |       24,489,762.49 |    0.2% |      1.10 | `Base58Encode`
|               39.71 |       25,182,310.62 |    0.2% |      1.10 | `Base58Encode`
```

and

make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000
Before:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               79.93 |       12,511,576.75 |    0.4% |      1.10 | `Base58CheckEncode`
|               79.49 |       12,580,136.21 |    0.1% |      1.10 | `Base58CheckEncode`
|               79.65 |       12,554,785.16 |    0.1% |      1.10 | `Base58CheckEncode`
```
After:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               55.27 |       18,094,561.35 |    0.1% |      1.10 | `Base58CheckEncode`
|               55.41 |       18,046,778.32 |    0.1% |      1.10 | `Base58CheckEncode`
|               55.32 |       18,075,763.16 |    0.1% |      1.10 | `Base58CheckEncode`
```

Co-authored-by: optout21 <13562139+optout21@users.noreply.github.com>
To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7, fitting them into 64-bit long integers.
This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead.
The benchmarks indicate a 4x speedup for `Base58CheckEncode` and `Base58Encode`:

make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000
After:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               14.32 |       69,831,031.44 |    0.1% |      1.10 | `Base58Encode`
|               14.32 |       69,811,995.18 |    0.1% |      1.10 | `Base58Encode`
|               14.35 |       69,662,527.88 |    0.5% |      1.10 | `Base58Encode`
```

make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000
After:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               19.15 |       52,231,968.11 |    0.1% |      1.06 | `Base58CheckEncode`
|               19.13 |       52,269,345.54 |    0.4% |      1.05 | `Base58CheckEncode`
|               19.14 |       52,244,117.61 |    0.3% |      1.06 | `Base58CheckEncode`
```
> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000
Before:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               17.50 |       57,141,622.65 |    0.1% |      1.10 | `Base58Decode`
|               17.42 |       57,392,223.96 |    0.0% |      1.10 | `Base58Decode`
|               17.43 |       57,358,655.44 |    0.2% |      1.10 | `Base58Decode`
```
After:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               16.37 |       61,093,842.90 |    0.2% |      1.10 | `Base58Decode`
|               16.37 |       61,100,514.64 |    0.1% |      1.10 | `Base58Decode`
|               16.38 |       61,046,275.93 |    0.1% |      1.10 | `Base58Decode`
```
> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000

After:
```
|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                8.79 |      113,767,685.06 |    0.1% |      1.10 | `Base58Decode`
|                8.78 |      113,831,528.33 |    0.0% |      1.10 | `Base58Decode`
|                8.80 |      113,607,470.35 |    0.2% |      1.10 | `Base58Decode`
```
@achow101
Copy link
Member

achow101 commented Nov 4, 2024

As this is basically a complete rewrite of the base58 encoding and decoding code, I don't think the review effort required to review this is worth it relative to the unimportance of these functions.

@l0rinc
Copy link
Contributor Author

l0rinc commented Nov 5, 2024

Thanks for checking it out @achow101.
I could make the diff minimal, but given that it's not that important, I will leave it as is - until someone starts complaining that listunspent is slow :)

@l0rinc
Copy link
Contributor Author

l0rinc commented Nov 25, 2024

I just found this was already attempted in a very similar way by @martinus a few years ago in #21176 - so now we have two versions, if this will turn out to be important again.
Closing, thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.