Skip to content

Conversation

theuni
Copy link
Contributor

@theuni theuni commented Dec 11, 2023

This corresponds with an upcoming pr to Core to do the same.

It's unclear why the optimized versions ignore the max param. I'm guessing it's because the output of the builtins is already capped? I wasn't sure, so I copied the behavior of the other builtins.

@gmaxwell
Copy link
Contributor

gmaxwell commented Dec 12, 2023

Historical background: The maximum value is there because by construction the maximum is known in the usage in this code. Not just from the type, but it's called over and over again after some operations are performed that can't increase the result so the prior result is the next step's maximum.

This lets the unoptimized code run a fair amount faster than it does without the maximum (in fact, out of several possible constructions for the native code, the version that was fastest was the simple approach specifically because it can take a maximum).

The dedicated instructions don't have any way to tell them a maximum (and are fast enough / implemented in such a way) that a maximum wouldn't be helpful.

I'm not following development, but I'd expect using the C++20 function when the native instructions aren't available will be slower than the code here ('cause they can make use of the known maximum and the C++ function can't).

@sipa
Copy link
Collaborator

sipa commented Dec 12, 2023

@gmaxwell Good points. I think the question is whether any architectures (that we'd want to target) exist for which no accelerated/builtin exist that's at least as fast as the naive "check every bit with known maximum" fallback.

I expect that at least the C++20 designers thought the answer is no, as they're generally keen on enabling efficient use. Sadly, it does seem there is one: RISC-V doesn't have a CLZ instruction (there is one proposed in the bitmanip extension, but that seems not yet incorporated in the standard)...

Of course, the current code isn't aware of this. On RV64 GCC, __builtin_clz exists (which is likely slower than the fallback, as from what I read, it's implemented using a loop over bytes + lookup table for bits within the byte), so the slower version will already be used...

@theuni
Copy link
Contributor Author

theuni commented Dec 12, 2023

Thanks for the context! That's very helpful.

FWIW, I traced what happens with this in libc++ (just to get an idea of what a sane stdlib impl might look like), and it's essentially a wrapper around the built-ins we're currently using: https://github.com/llvm/llvm-project/blob/main/libcxx/include/__bit/countl.h#L56

As @sipa mentioned, and as far as I can tell, there really isn't any realistic way of falling into the hand-written loop because pretty much every common/supported compiler should be hitting one of the other cases. So the code-change here should be essentially a no-op assuming the stdlib wrappers compile to nothing.

It seems to me we should either:

  • Use the stdlib as a generic means of hitting the built-ins (as proposed here)
  • Always use the hand-rolled loop and hope the compiler recognizes the pattern when possible for the target

Or as @sipa kinda alluded, we could do the first and also specialize arches known to be missing efficient count instructions like RISC-V.

@fanquake
Copy link
Member

@theuni could rebase here for (mostly) running CI?

@sipa
Copy link
Collaborator

sipa commented Apr 11, 2024

@theuni That makes sense to me. Effectively the C++20 function is equivalent in practice to the builtin (on gcc/clang). So if we're concerned about systems where __builtin_ctz/std::countr_zero is slower than just dumb bit iteration, we should fix that specifically for those platforms (possibly RISC-V) - probably by hardcoding detection for those platforms.

ACK e0e3dc5

@sipa
Copy link
Collaborator

sipa commented Apr 11, 2024

ACK 82b6488

@maflcko
Copy link

maflcko commented Apr 11, 2024

lgtm

@fanquake
Copy link
Member

Also cherry-picked this + the associated cleanup into bitcoin/bitcoin#29823 for a look.

@sipa sipa merged commit 33b7c20 into bitcoin-core:master Apr 11, 2024
fanquake added a commit to fanquake/bitcoin that referenced this pull request Apr 11, 2024
33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits
4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task
82b6488acb Add c++20 version of CountBits
0498084d31 ci: Fix "s390x (big-endian)" task
71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task
9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter
ed420bc170 ci: Fix `x86_64-w64-mingw32` task
fe1040f227 Drop -Wno-shift-count-overflow compile flag
154bcd43bd Avoid >> above type width in BitWriter
67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI
7de7250416 ci: Update macOS image for CI
83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler
e051a7d690 ci: Install wine32 package for Windows tests
2d2c695d78 build: Drop unused `CC` variable
1810fcbd11 ci: Use correct variable to designate C++ compiler
022b959049 Merge bitcoin-core/minisketch#77: Add missing include
08443c4892 Add missing include

git-subtree-dir: src/minisketch
git-subtree-split: 33b7c200b97498139aa1462a0dd0d6eba49d98b7
fanquake added a commit to fanquake/bitcoin that referenced this pull request Apr 12, 2024
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1
653d8b2e26 Avoid overflowing shift by special casing inverse of 1
33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits
4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task
82b6488acb Add c++20 version of CountBits
0498084d31 ci: Fix "s390x (big-endian)" task
71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task
9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter
ed420bc170 ci: Fix `x86_64-w64-mingw32` task
fe1040f227 Drop -Wno-shift-count-overflow compile flag
154bcd43bd Avoid >> above type width in BitWriter
67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI
7de7250416 ci: Update macOS image for CI
83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler
e051a7d690 ci: Install wine32 package for Windows tests
2d2c695d78 build: Drop unused `CC` variable
1810fcbd11 ci: Use correct variable to designate C++ compiler
022b959049 Merge bitcoin-core/minisketch#77: Add missing include
08443c4892 Add missing include

git-subtree-dir: src/minisketch
git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
fanquake added a commit to bitcoin/bitcoin that referenced this pull request Apr 15, 2024
…6b3fee233a67492

4722b7c build: remove minisketch clz check (fanquake)
1eea10a Squashed 'src/minisketch/' changes from a571ba20f9..3472e2f5ec (fanquake)

Pull request description:

  bitcoin-core/minisketch#81 will fix #29799.
  Minor build cleanups after bitcoin-core/minisketch#80.

ACKs for top commit:
  dergoegge:
    utACK 4722b7c
  hebasto:
    ACK 4722b7c, I have verified the subtree update and reviewed the build system changes. Both look OK.

Tree-SHA512: eabd82e5a13cc4f32155319df97368f2e8c93320a4265b6c372efcb1ea4e756f6693df7c02498c8ea989ccd376a20277fa110c66d0754cb9bca5e54d18e0a965
@fanquake
Copy link
Member

Sadly, it does seem there is one: RISC-V doesn't have a CLZ instruction (there is one proposed in the bitmanip extension, but that seems not yet incorporated in the standard)...

It looks like this will at least be available for use in some way, starting with GCC 14: https://gcc.gnu.org/gcc-14/changes.html, which ships with a large number of RISC-V changes, including:

Support for the experimental vector crypto intrinsics as specified in RISC-V vector intrinsic specification, thanks to Feng Wang et al. from ESWIN Computing

janus pushed a commit to BitgesellOfficial/bitgesell that referenced this pull request Jun 6, 2024
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1
653d8b2e26 Avoid overflowing shift by special casing inverse of 1
33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits
4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task
82b6488acb Add c++20 version of CountBits
0498084d31 ci: Fix "s390x (big-endian)" task
71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task
9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter
ed420bc170 ci: Fix `x86_64-w64-mingw32` task
fe1040f227 Drop -Wno-shift-count-overflow compile flag
154bcd43bd Avoid >> above type width in BitWriter
67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI
7de7250416 ci: Update macOS image for CI
83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler
e051a7d690 ci: Install wine32 package for Windows tests
2d2c695d78 build: Drop unused `CC` variable
1810fcbd11 ci: Use correct variable to designate C++ compiler
022b959049 Merge bitcoin-core/minisketch#77: Add missing include
08443c4892 Add missing include

git-subtree-dir: src/minisketch
git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
vmta added a commit to umkoin/umkoin that referenced this pull request Oct 6, 2024
eb37a9b8e Merge bitcoin-core/minisketch#87: Avoid copy in self-assign
fe6557642 Merge bitcoin-core/minisketch#88: build: Add `-Wundef`
8ea298bfa Avoid copy in self-assign
978a3d886 build: Add `-Wundef`
338704417 Merge bitcoin-core/minisketch#86: doc: fix typo in sketch_impl.h
15c2d13b6 doc: fix typo in sketch_impl.h
7be08b8a4 Merge bitcoin-core/minisketch#85: Fixes for integer precision loss
00fb4a4d8 Avoid or make integer precision conversion explicit
9d62a4d27 Avoid the need to cast/convert to size_t for vector operations
19e06cc7a Prevent overflows from large capacity/max_elements
3472e2f5e Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1
653d8b2e2 Avoid overflowing shift by special casing inverse of 1
33b7c200b Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits
4a48f31a3 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task
82b6488ac Add c++20 version of CountBits
0498084d3 ci: Fix "s390x (big-endian)" task
71709dca9 Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task
9e6127fa9 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter
ed420bc17 ci: Fix `x86_64-w64-mingw32` task
fe1040f22 Drop -Wno-shift-count-overflow compile flag
154bcd43b Avoid >> above type width in BitWriter
67b87acdb Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI
7de725041 ci: Update macOS image for CI
83d812ea9 Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler
e051a7d69 ci: Install wine32 package for Windows tests
2d2c695d7 build: Drop unused `CC` variable
1810fcbd1 ci: Use correct variable to designate C++ compiler
022b95904 Merge bitcoin-core/minisketch#77: Add missing include
08443c489 Add missing include
a571ba20f Merge bitcoin-core/minisketch#68: Add missed `#include <string>`
b9a7f7e2b Merge bitcoin-core/minisketch#69: refactor: Drop unused `total` local variables
8a5af94ed Merge bitcoin-core/minisketch#70: build: Remove `-Qunused-arguments` workaround for clang + ccache
c36f1f03a Merge bitcoin-core/minisketch#72: Fix MSVC implementation of `CountBits()` function
0078bedda Ignore `HAVE_CLZ` macro when building with MSVC
1c772918c Fix MSVC implementation of `CountBits()` function
98f87c55f build: Remove `-Qunused-arguments` workaround for clang + ccache
11a1e25c8 refactor: Drop unused `total` local variables
ed6c8fcfd Add missed `#include <string>`
47f0a2d26 Merge bitcoin-core/minisketch#66: msvc: remove direct Bitcoin Core `compat.h` include
64f17584c msvc: remove Core compat.h include

git-subtree-dir: src/minisketch
git-subtree-split: eb37a9b8e79f9e49d73b96a49bf97a96d9eb676c
kwvg added a commit to kwvg/dash that referenced this pull request Oct 29, 2024
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1
653d8b2e26 Avoid overflowing shift by special casing inverse of 1
33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits
4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task
82b6488acb Add c++20 version of CountBits
0498084d31 ci: Fix "s390x (big-endian)" task
71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task
9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter
ed420bc170 ci: Fix `x86_64-w64-mingw32` task
fe1040f227 Drop -Wno-shift-count-overflow compile flag
154bcd43bd Avoid >> above type width in BitWriter
67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI
7de7250416 ci: Update macOS image for CI
83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler
e051a7d690 ci: Install wine32 package for Windows tests
2d2c695d78 build: Drop unused `CC` variable
1810fcbd11 ci: Use correct variable to designate C++ compiler
022b959049 Merge bitcoin-core/minisketch#77: Add missing include
08443c4892 Add missing include

git-subtree-dir: src/minisketch
git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
tomt1664 pushed a commit to tomt1664/elements that referenced this pull request Apr 23, 2025
…75ace39ce9243af6b3fee233a67492

4722b7c build: remove minisketch clz check (fanquake)
1eea10a Squashed 'src/minisketch/' changes from a571ba20f9..3472e2f5ec (fanquake)

Pull request description:

  bitcoin-core/minisketch#81 will fix #29799.
  Minor build cleanups after bitcoin-core/minisketch#80.

ACKs for top commit:
  dergoegge:
    utACK 4722b7c
  hebasto:
    ACK 4722b7c, I have verified the subtree update and reviewed the build system changes. Both look OK.

Tree-SHA512: eabd82e5a13cc4f32155319df97368f2e8c93320a4265b6c372efcb1ea4e756f6693df7c02498c8ea989ccd376a20277fa110c66d0754cb9bca5e54d18e0a965
tomt1664 pushed a commit to tomt1664/elements that referenced this pull request Apr 24, 2025
…75ace39ce9243af6b3fee233a67492

4722b7c build: remove minisketch clz check (fanquake)
1eea10a Squashed 'src/minisketch/' changes from a571ba20f9..3472e2f5ec (fanquake)

Pull request description:

  bitcoin-core/minisketch#81 will fix #29799.
  Minor build cleanups after bitcoin-core/minisketch#80.

ACKs for top commit:
  dergoegge:
    utACK 4722b7c
  hebasto:
    ACK 4722b7c, I have verified the subtree update and reviewed the build system changes. Both look OK.

Tree-SHA512: eabd82e5a13cc4f32155319df97368f2e8c93320a4265b6c372efcb1ea4e756f6693df7c02498c8ea989ccd376a20277fa110c66d0754cb9bca5e54d18e0a965
tomt1664 pushed a commit to tomt1664/elements that referenced this pull request Apr 24, 2025
3472e2f5ec Merge bitcoin-core/minisketch#81: Avoid overflowing shift by special casing inverse of 1
653d8b2e26 Avoid overflowing shift by special casing inverse of 1
33b7c200b9 Merge bitcoin-core/minisketch#80: Add c++20 version of CountBits
4a48f31a37 Merge bitcoin-core/minisketch#83: ci: Fix "s390x (big-endian)" task
82b6488acb Add c++20 version of CountBits
0498084d31 ci: Fix "s390x (big-endian)" task
71709dca9e Merge bitcoin-core/minisketch#82: ci: Fix `x86_64-w64-mingw32` task
9e6127fa98 Merge bitcoin-core/minisketch#74: Avoid >> above type width in BitWriter
ed420bc170 ci: Fix `x86_64-w64-mingw32` task
fe1040f227 Drop -Wno-shift-count-overflow compile flag
154bcd43bd Avoid >> above type width in BitWriter
67b87acdb6 Merge bitcoin-core/minisketch#78: ci: Update macOS image for CI
7de7250416 ci: Update macOS image for CI
83d812ea9f Merge bitcoin-core/minisketch#73: ci: Use correct variable to designate C++ compiler
e051a7d690 ci: Install wine32 package for Windows tests
2d2c695d78 build: Drop unused `CC` variable
1810fcbd11 ci: Use correct variable to designate C++ compiler
022b959049 Merge bitcoin-core/minisketch#77: Add missing include
08443c4892 Add missing include

git-subtree-dir: src/minisketch
git-subtree-split: 3472e2f5ec75ace39ce9243af6b3fee233a67492
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants