-
Notifications
You must be signed in to change notification settings - Fork 58
Add c++20 version of CountBits #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Historical background: The maximum value is there because by construction the maximum is known in the usage in this code. Not just from the type, but it's called over and over again after some operations are performed that can't increase the result so the prior result is the next step's maximum. This lets the unoptimized code run a fair amount faster than it does without the maximum (in fact, out of several possible constructions for the native code, the version that was fastest was the simple approach specifically because it can take a maximum). The dedicated instructions don't have any way to tell them a maximum (and are fast enough / implemented in such a way) that a maximum wouldn't be helpful. I'm not following development, but I'd expect using the C++20 function when the native instructions aren't available will be slower than the code here ('cause they can make use of the known maximum and the C++ function can't). |
@gmaxwell Good points. I think the question is whether any architectures (that we'd want to target) exist for which no accelerated/builtin exist that's at least as fast as the naive "check every bit with known maximum" fallback. I expect that at least the C++20 designers thought the answer is no, as they're generally keen on enabling efficient use. Sadly, it does seem there is one: RISC-V doesn't have a CLZ instruction (there is one proposed in the Of course, the current code isn't aware of this. On RV64 GCC, |
Thanks for the context! That's very helpful. FWIW, I traced what happens with this in libc++ (just to get an idea of what a sane stdlib impl might look like), and it's essentially a wrapper around the built-ins we're currently using: https://github.com/llvm/llvm-project/blob/main/libcxx/include/__bit/countl.h#L56 As @sipa mentioned, and as far as I can tell, there really isn't any realistic way of falling into the hand-written loop because pretty much every common/supported compiler should be hitting one of the other cases. So the code-change here should be essentially a no-op assuming the stdlib wrappers compile to nothing. It seems to me we should either:
Or as @sipa kinda alluded, we could do the first and also specialize arches known to be missing efficient count instructions like RISC-V. |
@theuni could rebase here for (mostly) running CI? |
@theuni That makes sense to me. Effectively the C++20 function is equivalent in practice to the builtin (on gcc/clang). So if we're concerned about systems where ACK e0e3dc5 |
ACK 82b6488 |
lgtm |
Also cherry-picked this + the associated cleanup into bitcoin/bitcoin#29823 for a look. |
33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits 4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task 82b6488acb Add c++20 version of CountBits 0498084d31 ci: Fix "s390x (big-endian)" task 71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task 9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter ed420bc170 ci: Fix `x86_64-w64-mingw32` task fe1040f227 Drop -Wno-shift-count-overflow compile flag 154bcd43bd Avoid >> above type width in BitWriter 67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI 7de7250416 ci: Update macOS image for CI 83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler e051a7d690 ci: Install wine32 package for Windows tests 2d2c695d78 build: Drop unused `CC` variable 1810fcbd11 ci: Use correct variable to designate C++ compiler 022b959049 Merge bitcoin-core/minisketch#77: Add missing include 08443c4892 Add missing include git-subtree-dir: src/minisketch git-subtree-split: 33b7c200b97498139aa1462a0dd0d6eba49d98b7
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1 653d8b2e26 Avoid overflowing shift by special casing inverse of 1 33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits 4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task 82b6488acb Add c++20 version of CountBits 0498084d31 ci: Fix "s390x (big-endian)" task 71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task 9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter ed420bc170 ci: Fix `x86_64-w64-mingw32` task fe1040f227 Drop -Wno-shift-count-overflow compile flag 154bcd43bd Avoid >> above type width in BitWriter 67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI 7de7250416 ci: Update macOS image for CI 83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler e051a7d690 ci: Install wine32 package for Windows tests 2d2c695d78 build: Drop unused `CC` variable 1810fcbd11 ci: Use correct variable to designate C++ compiler 022b959049 Merge bitcoin-core/minisketch#77: Add missing include 08443c4892 Add missing include git-subtree-dir: src/minisketch git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
…6b3fee233a67492 4722b7c build: remove minisketch clz check (fanquake) 1eea10a Squashed 'src/minisketch/' changes from a571ba20f9..3472e2f5ec (fanquake) Pull request description: bitcoin-core/minisketch#81 will fix #29799. Minor build cleanups after bitcoin-core/minisketch#80. ACKs for top commit: dergoegge: utACK 4722b7c hebasto: ACK 4722b7c, I have verified the subtree update and reviewed the build system changes. Both look OK. Tree-SHA512: eabd82e5a13cc4f32155319df97368f2e8c93320a4265b6c372efcb1ea4e756f6693df7c02498c8ea989ccd376a20277fa110c66d0754cb9bca5e54d18e0a965
It looks like this will at least be available for use in some way, starting with GCC 14: https://gcc.gnu.org/gcc-14/changes.html, which ships with a large number of RISC-V changes, including:
|
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1 653d8b2e26 Avoid overflowing shift by special casing inverse of 1 33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits 4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task 82b6488acb Add c++20 version of CountBits 0498084d31 ci: Fix "s390x (big-endian)" task 71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task 9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter ed420bc170 ci: Fix `x86_64-w64-mingw32` task fe1040f227 Drop -Wno-shift-count-overflow compile flag 154bcd43bd Avoid >> above type width in BitWriter 67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI 7de7250416 ci: Update macOS image for CI 83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler e051a7d690 ci: Install wine32 package for Windows tests 2d2c695d78 build: Drop unused `CC` variable 1810fcbd11 ci: Use correct variable to designate C++ compiler 022b959049 Merge bitcoin-core/minisketch#77: Add missing include 08443c4892 Add missing include git-subtree-dir: src/minisketch git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
eb37a9b8e Merge bitcoin-core/minisketch#87: Avoid copy in self-assign fe6557642 Merge bitcoin-core/minisketch#88: build: Add `-Wundef` 8ea298bfa Avoid copy in self-assign 978a3d886 build: Add `-Wundef` 338704417 Merge bitcoin-core/minisketch#86: doc: fix typo in sketch_impl.h 15c2d13b6 doc: fix typo in sketch_impl.h 7be08b8a4 Merge bitcoin-core/minisketch#85: Fixes for integer precision loss 00fb4a4d8 Avoid or make integer precision conversion explicit 9d62a4d27 Avoid the need to cast/convert to size_t for vector operations 19e06cc7a Prevent overflows from large capacity/max_elements 3472e2f5e Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1 653d8b2e2 Avoid overflowing shift by special casing inverse of 1 33b7c200b Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits 4a48f31a3 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task 82b6488ac Add c++20 version of CountBits 0498084d3 ci: Fix "s390x (big-endian)" task 71709dca9 Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task 9e6127fa9 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter ed420bc17 ci: Fix `x86_64-w64-mingw32` task fe1040f22 Drop -Wno-shift-count-overflow compile flag 154bcd43b Avoid >> above type width in BitWriter 67b87acdb Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI 7de725041 ci: Update macOS image for CI 83d812ea9 Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler e051a7d69 ci: Install wine32 package for Windows tests 2d2c695d7 build: Drop unused `CC` variable 1810fcbd1 ci: Use correct variable to designate C++ compiler 022b95904 Merge bitcoin-core/minisketch#77: Add missing include 08443c489 Add missing include a571ba20f Merge bitcoin-core/minisketch#68: Add missed `#include <string>` b9a7f7e2b Merge bitcoin-core/minisketch#69: refactor: Drop unused `total` local variables 8a5af94ed Merge bitcoin-core/minisketch#70: build: Remove `-Qunused-arguments` workaround for clang + ccache c36f1f03a Merge bitcoin-core/minisketch#72: Fix MSVC implementation of `CountBits()` function 0078bedda Ignore `HAVE_CLZ` macro when building with MSVC 1c772918c Fix MSVC implementation of `CountBits()` function 98f87c55f build: Remove `-Qunused-arguments` workaround for clang + ccache 11a1e25c8 refactor: Drop unused `total` local variables ed6c8fcfd Add missed `#include <string>` 47f0a2d26 Merge bitcoin-core/minisketch#66: msvc: remove direct Bitcoin Core `compat.h` include 64f17584c msvc: remove Core compat.h include git-subtree-dir: src/minisketch git-subtree-split: eb37a9b8e79f9e49d73b96a49bf97a96d9eb676c
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1 653d8b2e26 Avoid overflowing shift by special casing inverse of 1 33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits 4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task 82b6488acb Add c++20 version of CountBits 0498084d31 ci: Fix "s390x (big-endian)" task 71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task 9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter ed420bc170 ci: Fix `x86_64-w64-mingw32` task fe1040f227 Drop -Wno-shift-count-overflow compile flag 154bcd43bd Avoid >> above type width in BitWriter 67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI 7de7250416 ci: Update macOS image for CI 83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler e051a7d690 ci: Install wine32 package for Windows tests 2d2c695d78 build: Drop unused `CC` variable 1810fcbd11 ci: Use correct variable to designate C++ compiler 022b959049 Merge bitcoin-core/minisketch#77: Add missing include 08443c4892 Add missing include git-subtree-dir: src/minisketch git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
…75ace39ce9243af6b3fee233a67492 4722b7c build: remove minisketch clz check (fanquake) 1eea10a Squashed 'src/minisketch/' changes from a571ba20f9..3472e2f5ec (fanquake) Pull request description: bitcoin-core/minisketch#81 will fix #29799. Minor build cleanups after bitcoin-core/minisketch#80. ACKs for top commit: dergoegge: utACK 4722b7c hebasto: ACK 4722b7c, I have verified the subtree update and reviewed the build system changes. Both look OK. Tree-SHA512: eabd82e5a13cc4f32155319df97368f2e8c93320a4265b6c372efcb1ea4e756f6693df7c02498c8ea989ccd376a20277fa110c66d0754cb9bca5e54d18e0a965
…75ace39ce9243af6b3fee233a67492 4722b7c build: remove minisketch clz check (fanquake) 1eea10a Squashed 'src/minisketch/' changes from a571ba20f9..3472e2f5ec (fanquake) Pull request description: bitcoin-core/minisketch#81 will fix #29799. Minor build cleanups after bitcoin-core/minisketch#80. ACKs for top commit: dergoegge: utACK 4722b7c hebasto: ACK 4722b7c, I have verified the subtree update and reviewed the build system changes. Both look OK. Tree-SHA512: eabd82e5a13cc4f32155319df97368f2e8c93320a4265b6c372efcb1ea4e756f6693df7c02498c8ea989ccd376a20277fa110c66d0754cb9bca5e54d18e0a965
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1 653d8b2e26 Avoid overflowing shift by special casing inverse of 1 33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits 4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task 82b6488acb Add c++20 version of CountBits 0498084d31 ci: Fix "s390x (big-endian)" task 71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task 9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter ed420bc170 ci: Fix `x86_64-w64-mingw32` task fe1040f227 Drop -Wno-shift-count-overflow compile flag 154bcd43bd Avoid >> above type width in BitWriter 67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI 7de7250416 ci: Update macOS image for CI 83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler e051a7d690 ci: Install wine32 package for Windows tests 2d2c695d78 build: Drop unused `CC` variable 1810fcbd11 ci: Use correct variable to designate C++ compiler 022b959049 Merge bitcoin-core/minisketch#77: Add missing include 08443c4892 Add missing include git-subtree-dir: src/minisketch git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
This corresponds with an upcoming pr to Core to do the same.
It's unclear why the optimized versions ignore the max param. I'm guessing it's because the output of the builtins is already capped? I wasn't sure, so I copied the behavior of the other builtins.