Skip to content

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Mar 9, 2024

Started optimizing the base conversions in TryParseHex, Base58 and IsSpace - this is the next step.

Part of this change was already merged in #30047, which made decoding ~26% faster.

Here I've reduced the memory reallocations and copying operations in bech32 encode, making it ~15% faster.

make && ./src/bench/bench_bitcoin --filter='Bech32Encode' --min-time=1000

Before:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               19.97 |       50,074,562.72 |    0.1% |      1.06 | `Bech32Encode`

After:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               17.33 |       57,687,668.20 |    0.1% |      1.10 | `Bech32Encode`

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 9, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK josibake, sipa, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from 6c6c228 to 4f76265 Compare March 9, 2024 13:16
@l0rinc l0rinc marked this pull request as ready for review March 9, 2024 14:28
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from 4f76265 to db83766 Compare April 21, 2024 10:46
@l0rinc l0rinc marked this pull request as draft April 21, 2024 11:53
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch 10 times, most recently from f8fa74a to b3b84c3 Compare April 22, 2024 10:31
@l0rinc l0rinc marked this pull request as ready for review April 22, 2024 12:57
@josibake
Copy link
Member

Concept ACK

I cherry-picked your last commit into #30047 to get rid of the hardcoded values inside ExpandHRP. Looking at the rest of your commits here, the rest of the changes look great and the benchmark numbers are a nice improvement. Will review more thoroughly this week!

@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch 2 times, most recently from 7dd039a to b856835 Compare May 12, 2024 12:05
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from b856835 to 00dc933 Compare May 13, 2024 09:40
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from 00dc933 to 1048dd6 Compare May 20, 2024 18:25
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch 2 times, most recently from 1d96f43 to aa396df Compare May 29, 2024 09:27
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from aa396df to f45539c Compare June 5, 2024 10:03
@l0rinc l0rinc changed the title refactor: Reduce memory copying operations in bech32 encoding/decoding refactor: Reduce memory copying operations in bech32 encoding Jun 5, 2024
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from f45539c to ec0261d Compare June 5, 2024 10:46
@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 5, 2024

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/25832897694

@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from ec0261d to d9586d6 Compare June 5, 2024 10:46
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch 4 times, most recently from 814c05e to e7c55d4 Compare June 5, 2024 10:59
@l0rinc
Copy link
Contributor Author

l0rinc commented Jun 5, 2024

@josibake, now that the cherry-pick was merged, I've rebased and re-measured - the important part of this change was already included in the other PR, the remaining optimizations here are smaller, but also a lot simpler.

Here I've reduced the memory reallocations and copying operations in bech32 encode, making it ~15% faster.

make && ./src/bench/bench_bitcoin --filter='Bech32Encode' --min-time=1000

Before:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               19.97 |       50,074,562.72 |    0.1% |      1.06 | `Bech32Encode`
After:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               17.33 |       57,687,668.20 |    0.1% |      1.10 | `Bech32Encode`

Co-authored-by: josibake <josibake@protonmail.com>
@l0rinc l0rinc force-pushed the paplorinc/bech32_optimizations branch from e7c55d4 to 07f6417 Compare June 5, 2024 11:19
@josibake
Copy link
Member

josibake commented Jun 5, 2024

ACK 07f6417

Verified the benchmark locally. Aside from the speed improvements, this also improves the readability of the code.

@DrahtBot DrahtBot removed the CI failed label Jun 5, 2024
@sipa
Copy link
Member

sipa commented Jun 5, 2024

utACK 07f6417

Making the code more readable is a good reason to make this change, but this code was specifically written to prioritize simplicity over performance. If we'd care about performance, it's probably better to use an approach similar to that of the C reference implementation (as it works entirely on the stack, while the current code does several allocations: 1 in ExpandHRP, between 1 and 3 in CreateChecksum, and 1 inevitable one in Encode).

@achow101
Copy link
Member

ACK 07f6417

@achow101 achow101 merged commit fcc3b65 into bitcoin:master Jun 13, 2024
@l0rinc l0rinc deleted the paplorinc/bech32_optimizations branch June 13, 2024 16:48
@l0rinc
Copy link
Contributor Author

l0rinc commented Jun 13, 2024

Thanks for the reviews @josibake, @sipa, @achow101.
There's a similar small optimization for transaction input/output allocations as well, I'd appreciate a review.

achow101 added a commit that referenced this pull request Nov 26, 2024
…ent-vector errors

11f3bc2 refactor: Reserve vectors in fuzz tests (Lőrinc)
152fefe refactor: Preallocate PrevectorFillVector(In)Direct without vector resize (Lőrinc)
a774c7a refactor: Fix remaining clang-tidy performance-inefficient-vector errors (Lőrinc)

Pull request description:

  PR inspired by #29608 (comment) (and #29458, #29606, #29607, #30093).

  The `clang-tidy` check can be run via:
  ```bash
  cmake -B build -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_BENCH=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_FOR_FUZZING=ON && cmake --build build -j$(nproc)

  run-clang-tidy -quiet -p build -j $(nproc) -checks='-*,performance-inefficient-vector-operation' | grep -v 'clang-tidy'
  ```
  which revealed 3 tests and 1 prod warning (+ fuzz and benching, found by hebasto).
  Even though the tests aren't performance critical, getting rid of these warnings (for which the checks were already enabled via https://github.com/bitcoin/bitcoin/blob/master/src/.clang-tidy#L18, see below), the fix was quite simple.

  <details>
  <summary>clang-tidy -list-checks</summary>

  ```bash
  cd src && clang-tidy -list-checks | grep 'vector'
      performance-inefficient-vector-operation
  ```

  </details>

  <details>
  <summary>Output before the change</summary>

  ```
  src/test/rpc_tests.cpp:434:9: error: 'emplace_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
    433 |     for (int64_t i = 0; i < 100; i++) {
    434 |         feerates.emplace_back(1 ,1);
        |         ^

  src/test/checkqueue_tests.cpp:366:13: error: 'emplace_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
    365 |         for (size_t i = 0; i < 3; ++i) {
    366 |             tg.emplace_back(
        |             ^

  src/test/cuckoocache_tests.cpp:231:9: error: 'emplace_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
    228 |     for (uint32_t x = 0; x < 3; ++x)
    229 |         /** Each thread is emplaced with x copy-by-value
    230 |         */
    231 |         threads.emplace_back([&, x] {
        |         ^

  src/rpc/output_script.cpp:127:17: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
    126 |             for (unsigned int i = 0; i < keys.size(); ++i) {
    127 |                 pubkeys.push_back(HexToPubKey(keys[i].get_str()));
        |                 ^
  ```

  And the fuzz and benchmarks, noticed by hebasto: #31305 (comment)

  </details>

ACKs for top commit:
  maflcko:
    review ACK 11f3bc2 🎦
  achow101:
    ACK 11f3bc2
  theuni:
    ACK 11f3bc2
  hebasto:
    ACK 11f3bc2, tested with clang 19.1.5 + clang-tidy.

Tree-SHA512: 41691c19f35c63b922a95407617a54f9bff1af3f95f99d15642064f321df038aeb1ae5f061f854ed913f69036807cc28fa6222b2ff4c24ef43b909027fa0f9b3
@bitcoin bitcoin locked and limited conversation to collaborators Jun 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants