Skip to content

Conversation

sipa
Copy link
Member

@sipa sipa commented May 23, 2024

Extracted from #30126.

This introduces the BitSet data structure, inspired by std::bitset, but with a few features that cannot be implemented on top without efficiency loss:

  • Finding the first set bit (First)
  • Finding the last set bit (Last)
  • Iterating over all set bits (begin and end).

And a few other operators/member functions that help readability for #30126:

  • operator- for set subtraction
  • Overlaps() for testing whether intersection is non-empty
  • IsSupersetOf() for testing (non-strict) supersetness
  • IsSubsetOf() for testing (non-strict) subsetness
  • Fill() to construct a set with all numbers from 0 to n-1, inclusive
  • Singleton() to construct a set with one specific element.

Everything is tested through a simulation-based fuzz test that compares the behavior with normal std::bitset equivalent operations.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 23, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK instagibbs, cbergqvist, theStack, achow101
Concept ACK hebasto

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30126 (Low-level cluster linearization code by sipa)
  • #29625 (Several randomness improvements by sipa)
  • #28676 ([WIP] Cluster mempool implementation by sdaftuar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipa sipa force-pushed the 202405_bitset branch 2 times, most recently from 0505f4f to aeef8a4 Compare May 23, 2024 18:23
@sipa sipa force-pushed the 202405_bitset branch 2 times, most recently from 692e533 to 3bc131a Compare May 25, 2024 22:07
Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at 3bc131a

Concept: Bummer that std::countr_zero and std::countl_zero aren't implemented for std::bitset. And for iterating it seems like one would be using std::vector<bool> and hoping for the best in terms of implementation. So makes sense to introduce BitSet.

Summary still mentions operator/ instead of operator-.

(Fuzz-code is too new to me to comment).

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@sipa sipa force-pushed the 202405_bitset branch 2 times, most recently from f03f3fe to 18ec3fc Compare May 29, 2024 15:44
@sipa
Copy link
Member Author

sipa commented May 29, 2024

Added additional Assume()s to prevent operator++ on iterators past the end:

--- a/src/util/bitset.h
+++ b/src/util/bitset.h
@@ -101,6 +101,7 @@ class IntBitSet
         /** Progress to the next 1 bit (only if != IteratorEnd). */
         constexpr Iterator& operator++() noexcept
         {
+            Assume(m_val != 0);
             m_val &= m_val - I{1U};
             if (m_val != 0) m_pos = std::countr_zero(m_val);
             return *this;
@@ -272,6 +273,7 @@ class MultiIntBitSet
         /** Progress to the next 1 bit (only if != IteratorEnd). */
         constexpr Iterator& operator++() noexcept
         {
+            Assume(m_idx < N);
             m_val &= m_val - I{1U};
             if (m_val == 0) {
                 while (true) {

@hebasto
Copy link
Member

hebasto commented Jun 4, 2024

Concept ACK.

@sipa sipa force-pushed the 202405_bitset branch from 18ec3fc to c99063e Compare June 4, 2024 15:56
@instagibbs
Copy link
Member

concept ACK

requested changes were included, will continue review

git range-diff master 18ec3fc c99063e

@DrahtBot DrahtBot mentioned this pull request Jun 6, 2024
8 tasks
Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c99063e

(Ran git range-diff 3bc131a~2..3bc131a c99063e~2..c99063e).

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c99063e seems to need swap fuzz coverage

rest of comments are nits

constexpr void Reset(unsigned pos) noexcept
{
Assume(pos < MAX_SIZE);
m_val &= ~I(I{1U} << pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just use conditional Set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't benchmarked or investigated compiled code, but I want to give the compiler every opportunity to use the simpler operation here (the conditional Set is more complex, and inlining will hopefully strength-reduce it, but I prefer not counting on that).

return *this;
}
/** Set a bit to 0. */
void constexpr Reset(unsigned pos) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just use conditional Set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as here

@DrahtBot DrahtBot requested a review from instagibbs June 7, 2024 16:07
@sipa sipa force-pushed the 202405_bitset branch from c99063e to a135ab9 Compare June 7, 2024 17:13
Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1e08ba3 so far, the code looks correct to me. Planning to review and run the fuzz test tomorrow.

sipa added 2 commits June 10, 2024 07:54
This adds a bitset module that implements a BitSet<N> class, a variant
of std::bitset with a few additional features that cannot be implemented
in a wrapper without performance loss (specifically, finding first and
last bit set, or iterating over all set bits).
@sipa
Copy link
Member Author

sipa commented Jun 10, 2024

Made a few small changes still:

@@ -107,7 +107,11 @@ class IntBitSet
             return *this;
         }
         /** Get the current bit position (only if != IteratorEnd). */
-        constexpr const unsigned& operator*() const noexcept { return m_pos; }
+        constexpr unsigned operator*() const noexcept
+        {
+            Assume(m_val != 0);
+            return m_pos;
+        }
     };
 
 public:
@@ -231,6 +235,8 @@ class MultiIntBitSet
     static constexpr unsigned LIMB_BITS = std::numeric_limits<I>::digits;
     /** Number of elements this set type supports. */
     static constexpr unsigned MAX_SIZE = LIMB_BITS * N;
+    // No overflow allowed here.
+    static_assert(MAX_SIZE / LIMB_BITS == N);
     /** Array whose member integers store the bits of the set. */
     std::array<I, N> m_val;
     /** Dummy type to return using end(). Only used for comparing with Iterator. */
@@ -293,7 +299,11 @@ class MultiIntBitSet
             return *this;
         }
         /** Get the current bit position (only if != IteratorEnd). */
-        constexpr const unsigned& operator*() const noexcept { return m_pos; }
+        constexpr unsigned operator*() const noexcept
+        {
+            Assume(m_idx < N);
+            return m_pos;
+        }
     };
 
 public:

@instagibbs
Copy link
Member

ACK 47f705b

swap coverage added, some of the nits taken, and a couple Assume()s added

reviewed via git range-diff master c99063e 47f705b

@DrahtBot DrahtBot requested review from cbergqvist and theStack June 10, 2024 13:26
Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 47f705b

Used same range-diff as instagibbs.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-review ACK 47f705b

@achow101
Copy link
Member

ACK 47f705b

@hebasto
Copy link
Member

hebasto commented Jul 14, 2024

Ported to the CMake-based build system in hebasto#264.

@bitcoin bitcoin locked and limited conversation to collaborators Jul 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants