Skip to content

Conversation

theuni
Copy link
Member

@theuni theuni commented Dec 8, 2023

This should give a speedup across the board for MSVC builds.

While working on modernizing our byteswapping code for c++20, we noticed that MSVC uses our hand-written byteswap functions, as opposed to using libc/compiler versions like almost all other platforms.

aureleoules did some great benchmarks in #28674 which show that these hand-written byteswaps often compile down to a slow mess.
hebasto confirmed that we're indeed hitting these paths for MSVC.

Quick tests with godbolt show that MSVC's provided _byteswap_* indeed speed things up.

This should give a significant speedup across the board.
@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 8, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK hebasto

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28674 ([POC] C++20 std::endian by fanquake)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@theuni theuni marked this pull request as draft December 8, 2023 18:57
@theuni
Copy link
Member Author

theuni commented Dec 8, 2023

Converted this to draft because I haven't actually tested the compile for MSVC (relying on c-i :( ), and because we'll need solid benchmarks before making any decisions.

@hebasto
Copy link
Member

hebasto commented Dec 8, 2023

Concept ACK on removing another MSVC-specific performance pessimization from the code.

@hebasto
Copy link
Member

hebasto commented Dec 9, 2023

Converted this to draft because I haven't actually tested the compile for MSVC (relying on c-i :( )

It compiles OK.

and because we'll need solid benchmarks before making any decisions.

What subset of benchmarks we might consider representative enough?

For now, I'm observing very tiny (a couple of %) improvements in some of them.

Also we should remember that MSVC build uses no optimisation (see e94ae81).

@fanquake
Copy link
Member

cc @sipsorcery.

Also we should remember that MSVC build uses no optimisation

Could rebase on master now that #29045 has gone in.

@sipsorcery
Copy link
Contributor

Builds fine for me on Windows 11 and msvc v19.38.33130 x64 (Visual Studio 2022).

I ran bench_bitcoin.exe but not sure what differences I should see.

@maflcko
Copy link
Member

maflcko commented Dec 12, 2023

See #28674 (comment) for benches that could make a difference.

But this requires a rebase on master first, see

Could rebase on master now that #29045 has gone in.

@theuni
Copy link
Member Author

theuni commented Jan 17, 2024

Obsoleted by #29263.

@fanquake fanquake closed this Jan 17, 2024
fanquake added a commit that referenced this pull request Mar 1, 2024
86b7f28 serialization: use internal endian conversion functions (Cory Fields)
432b18c serialization: detect byteswap builtins without autoconf tests (Cory Fields)
297367b crypto: replace CountBits with std::bit_width (Cory Fields)
52f9bba crypto: replace non-standard CLZ builtins with c++20's bit_width (Cory Fields)

Pull request description:

  This replaces #28674, #29036, and #29057. Now ready for testing and review.

  Replaces platform-specific endian and byteswap functions. This is especially useful for kernel, as it means that our deep serialization code no longer requires bitcoin-config.h.

  I apologize for the size of the last commit, but it's hard to avoid making those changes at once.

  All platforms now use our internal functions rather than libc or platform-specific ones, with the exception of MSVC.

  Sadly, benchmarking showed that not all compilers are capable of detecting and optimizing byteswap functions, so compiler builtins are instead used where possible. However, they're now detected via macros rather than autoconf checks.

  This[ matches how libc++ implements std::byteswap for c++23](https://github.com/llvm/llvm-project/blob/main/libcxx/include/__bit/byteswap.h#L26).

  I suggest we move/rename `compat/endian.h`, but I left that out of this PR to avoid bikeshedding.

  #29057 pointed out some irregularities in benchmarks. After messing with various compilers and configs for a few weeks with these changes, I'm of the opinion that we can't win on every platform every time, so we should take the code that makes sense going forward. That said, if any real-world slowdowns are caused here, we should obviously investigate.

ACKs for top commit:
  maflcko:
    ACK 86b7f28 📘
  fanquake:
    ACK 86b7f28 - we can finish pruning out the __builtin_clz* checks/usage once the minisketch code has been updated. This is more good cleanup pre-CMake & for the kernal.

Tree-SHA512: 715a32ec190c70505ffbce70bfe81fc7b6aa33e376b60292e801f60cf17025aabfcab4e8c53ebb2e28ffc5cf4c20b74fe3dd8548371ad772085c13aec8b7970e
@bitcoin bitcoin locked and limited conversation to collaborators Jan 16, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants