Skip to content

Conversation

jb55
Copy link
Contributor

@jb55 jb55 commented May 16, 2020

This builds off #18468 since it simplifies this PR a bunch. Review that first.

We can avoid many unnecessary std::vector allocations by changing
CBloomFilter to take Spans instead of std::vectors for the insert
and contains operations.

CBloomFilter currently converts types such as CDataStream and uint256
to std::vector on insert and contains. This is unnecessary because
CDataStreams and uint256s are already std::vectors internally. We just
need a way to point to the right data within those types. Span gives
us this ability.

This is a part of the Zero Allocations Project #18849 (ZAP3). This code came up as a place where many allocations occur. Mainly due to allocations of CService::GetKey which is passed to these functions, but this PR is needed before we get to that.

@practicalswift
Copy link
Contributor

practicalswift commented May 16, 2020

Do you have a good way to quantify the impact of this change? That would be interesting to see :)

@fanquake fanquake removed the Tests label May 16, 2020
@DrahtBot
Copy link
Contributor

DrahtBot commented May 16, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@jb55
Copy link
Contributor Author

jb55 commented May 16, 2020

@practicalswift this is something I want to do once I figure out how to use the benchmarking suite. This one might be nontrivial though since it's networking code. I might just do the benchmarking in #18849 to start.

One piece of data that I can show here is the number of allocations before and after which I'll do

@sipa
Copy link
Member

sipa commented May 16, 2020

Concept ACK, I think this is the right approach. In general I think that most functions that take either a const vector/prevector as input, or begin/end pointer, or a begin pointer + size, should be replaced with versions that just take in a Span.

It would be worthwhile to make uint256 amendable to the range Span constructor. I think all you'd need is rename the data field to m_data, and then add a data() member function that does the same as begin().

Nit: uint256 is not a vector internally, but an array.

@practicalswift
Copy link
Contributor

practicalswift commented May 16, 2020

I've noticed that we consistently pass Span by const-ref instead of by value in the project (this PR follows that convention): is that an intentional choice? Just curious :)

$ git grep -E '\([^(]*Span<.*>[^)]*\)' ":(exclude)src/span.h"
src/script/descriptor.cpp:std::string DescriptorChecksum(const Span<const char>& span)
src/script/descriptor.cpp:NODISCARD bool ParseKeyPath(const std::vector<Span<const char>>& split, KeyPath& out, std::string& error)
src/script/descriptor.cpp:std::unique_ptr<PubkeyProvider> ParsePubkeyInner(uint32_t key_exp_index, const Span<const char>& sp, bool permit_uncompressed, FlatSigningProvider& out, std::string& error)
src/script/descriptor.cpp:std::unique_ptr<PubkeyProvider> ParsePubkey(uint32_t key_exp_index, const Span<const char>& sp, bool permit_uncompressed, FlatSigningProvider& out, std::string& error)
src/script/descriptor.cpp:std::unique_ptr<DescriptorImpl> ParseScript(uint32_t key_exp_index, Span<const char>& sp, ParseScriptContext ctx, FlatSigningProvider& out, std::string& error)
src/script/descriptor.cpp:bool CheckChecksum(Span<const char>& sp, bool require_checksum, std::string& error, std::string* out_checksum = nullptr)
src/script/interpreter.cpp:static bool ExecuteWitnessScript(const Span<const valtype>& stack_span, const CScript& scriptPubKey, unsigned int flags, SigVersion sigversion, const BaseSignatureChecker& checker, ScriptError* serror)
src/serialize.h:template<typename Stream> inline void Serialize(Stream& s, const Span<const unsigned char>& span) { s.write(CharCast(span.data()), span.size()); }
src/serialize.h:template<typename Stream> inline void Serialize(Stream& s, const Span<unsigned char>& span) { s.write(CharCast(span.data()), span.size()); }
src/serialize.h:template<typename Stream> inline void Unserialize(Stream& s, Span<unsigned char>& span) { s.read(CharCast(span.data()), span.size()); }
src/test/util_tests.cpp:static std::string SpanToStr(Span<const char>& span)
src/util/spanparsing.cpp:bool Const(const std::string& str, Span<const char>& sp)
src/util/spanparsing.cpp:bool Func(const std::string& str, Span<const char>& sp)
src/util/spanparsing.cpp:Span<const char> Expr(Span<const char>& sp)
src/util/spanparsing.cpp:std::vector<Span<const char>> Split(const Span<const char>& sp, char sep)
src/util/spanparsing.h:bool Const(const std::string& str, Span<const char>& sp);
src/util/spanparsing.h:bool Func(const std::string& str, Span<const char>& sp);
src/util/spanparsing.h:Span<const char> Expr(Span<const char>& sp);
src/util/spanparsing.h:std::vector<Span<const char>> Split(const Span<const char>& sp, char sep);

@theStack
Copy link
Contributor

Concept ACK. Will review as soon as this is rebased.

I've noticed that we consistently pass Span by const-ref instead of by value in the project (this PR follows that convention): is that an intentional choice? Just curious :)

@practicalswift: Good point. I guess seen from a performance perspective it doesn't really make a difference, considering how lightweight spans are. In any case I agree that we should agree on one passing type and use it consistently in the code-base (my personal preference would be to always pass it by value, though).

@jb55
Copy link
Contributor Author

jb55 commented Aug 11, 2020 via email

@practicalswift
Copy link
Contributor

practicalswift commented Aug 12, 2020

@theStack @jb55 FWIW, throughout the C++ Core Guidelines span is consistently passed by value, and the following is said about it:

Note: A span<T> object does not own its elements and is so small that it can be passed by value.
Passing a span object as an argument is exactly as efficient as passing a pair of pointer arguments or passing a pointer and an integer count. (F.24)

Same goes for en.cppreference.com FWIW.

@jb55 jb55 force-pushed the 2020-05-bloom-span branch from a6d8f9f to 1466b65 Compare August 14, 2020 16:09
@jb55
Copy link
Contributor Author

jb55 commented Aug 14, 2020

rebased and switched from const Span<const unsigned char>& style to Span<const unsigned char>

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

@jb55
Copy link
Contributor Author

jb55 commented Aug 17, 2020 via email

@jb55 jb55 force-pushed the 2020-05-bloom-span branch from 1466b65 to 897f5be Compare August 17, 2020 15:21
@jb55
Copy link
Contributor Author

jb55 commented Aug 17, 2020

pushed 897f5be

git diff 1466b656625b3896176d3aa6f794d1d8d0f7c3e0 897f5be28c072c700a55a511bf0a88efc3b71602

much cleaner 🤩

diff --git a/src/bloom.cpp b/src/bloom.cpp
index 436e913aa5..c19d6e7d7d 100644
--- a/src/bloom.cpp
+++ b/src/bloom.cpp
@@ -59,12 +59,12 @@ void CBloomFilter::insert(const COutPoint& outpoint)
 {
     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
     stream << outpoint;
-    insert(Span<const unsigned char>((const unsigned char*)stream.data(), stream.size()));
+    insert(MakeUCharSpan(stream));
 }
 
 void CBloomFilter::insert(const uint256& hash)
 {
-    insert(Span<const unsigned char>(hash.begin(), hash.end()));
+    insert(MakeUCharSpan(hash));
 }
 
 bool CBloomFilter::contains(Span<const unsigned char> vKey) const
@@ -85,13 +85,12 @@ bool CBloomFilter::contains(const COutPoint& outpoint) const
 {
     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
     stream << outpoint;
-    return contains(Span<const unsigned char>((const unsigned char*)stream.data(),
-        stream.size()));
+    return contains(MakeUCharSpan(stream));
 }
 
 bool CBloomFilter::contains(const uint256& hash) const
 {
-    return contains(Span<const unsigned char>(hash.begin(), hash.end()));
+    return contains(MakeUCharSpan(hash));
 }
 
 bool CBloomFilter::IsWithinSizeConstraints() const
@@ -241,7 +240,7 @@ void CRollingBloomFilter::insert(Span<const unsigned char> vKey)
 
 void CRollingBloomFilter::insert(const uint256& hash)
 {
-    insert(Span<const unsigned char>(hash.begin(), hash.end()));
+    insert(MakeUCharSpan(hash));
 }
 
 bool CRollingBloomFilter::contains(Span<const unsigned char> vKey) const
@@ -260,7 +259,7 @@ bool CRollingBloomFilter::contains(Span<const unsigned char> vKey) const
 
 bool CRollingBloomFilter::contains(const uint256& hash) const
 {
-    return contains(Span<const unsigned char>(hash.begin(), hash.end()));
+    return contains(MakeUCharSpan(hash));
 }
 
 void CRollingBloomFilter::reset()
diff --git a/src/test/bloom_tests.cpp b/src/test/bloom_tests.cpp
index 1ad5607429..9cf6e352e4 100644
--- a/src/test/bloom_tests.cpp
+++ b/src/test/bloom_tests.cpp
@@ -91,7 +91,7 @@ BOOST_AUTO_TEST_CASE(bloom_create_insert_key)
     CBloomFilter filter(2, 0.001, 0, BLOOM_UPDATE_ALL);
     filter.insert(vchPubKey);
     uint160 hash = pubkey.GetID();
-    filter.insert(Span<unsigned char>(hash.begin(), hash.end()));
+    filter.insert(MakeUCharSpan(hash));
 
     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
     stream << filter;

We can avoid many unnecessary std::vector allocations by changing
CBloomFilter to take Spans instead of std::vector's for the `insert`
and `contains` operations.

CBloomFilter currently converts types such as CDataStream and uint256
to std::vector on `insert` and `contains`. This is unnecessary because
CDataStreams and uint256 are already std::vectors internally. We just
need a way to point to the right data within those types. Span gives
us this ability.

Signed-off-by: William Casarin <jb55@jb55.com>
@jb55 jb55 force-pushed the 2020-05-bloom-span branch from 897f5be to be8c3b4 Compare April 28, 2021 21:19
@jb55
Copy link
Contributor Author

jb55 commented Apr 28, 2021

rebased

Copy link

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK be8c3b4

Theoretically, this could be a good improvement because it removes the smart pointer from the game, but I never use it in practice the span and I'm curious to see how much improvement brings this game.

@jb55
Copy link
Contributor Author

jb55 commented May 5, 2021 via email

@0xB10C
Copy link
Contributor

0xB10C commented Aug 3, 2021

Concept ACK. Have you figured out a way to benchmark this?

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rebase

void insert(const uint256& hash);
bool contains(const std::vector<unsigned char>& vKey) const;
bool contains(Span<const unsigned char> vKey) const;
bool contains(const uint256& hash) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this method now

void insert(const COutPoint& outpoint);
void insert(const uint256& hash);

bool contains(const std::vector<unsigned char>& vKey) const;
bool contains(Span<const unsigned char> vKey) const;
bool contains(const COutPoint& outpoint) const;
bool contains(const uint256& hash) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -83,7 +83,7 @@ BOOST_AUTO_TEST_CASE(bloom_create_insert_key)
CBloomFilter filter(2, 0.001, 0, BLOOM_UPDATE_ALL);
filter.insert(vchPubKey);
uint160 hash = pubkey.GetID();
filter.insert(std::vector<unsigned char>(hash.begin(), hash.end()));
filter.insert(MakeUCharSpan(hash));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for MakeUCharSpan, filter.insert(hash); works now too

@@ -59,17 +59,15 @@ void CBloomFilter::insert(const COutPoint& outpoint)
{
CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
stream << outpoint;
std::vector<unsigned char> data(stream.begin(), stream.end());
insert(data);
insert(MakeUCharSpan(stream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for MakeUCharSpan

@laanwj
Copy link
Member

laanwj commented Aug 16, 2021

MarcoFalke added Waiting for author Needs rebase labels 3 hours ago
DrahtBot removed the Needs rebase label 2 hours ago

I think DrahtBot screwed up here. No rebase happened in between.

@jb55 jb55 closed this Sep 20, 2021
@sipa
Copy link
Member

sipa commented Sep 20, 2021

:(

@laanwj
Copy link
Member

laanwj commented Sep 20, 2021

Any specific reason for closing?

@jb55
Copy link
Contributor Author

jb55 commented Sep 20, 2021

lost interest. It's fine with me if someone else wants to pick it up.

@vincenzopalazzo
Copy link

@jb55, I can be interested in it.

I love your optimization, and maybe can be also a good starting point to push some code on bitcoin :-)

@fanquake
Copy link
Member

Rebased, with comments addressed in #23115.

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Sep 29, 2021
…rt` and `contains`

a11da75 bloom: cleanup includes (fanquake)
f1ed1d3 bloom: use constexpr where appropriate (fanquake)
2ba4ddf bloom: use Span instead of std::vector for `insert` and `contains` (William Casarin)

Pull request description:

  This is bitcoin#18985 rebased, with the most recent comments addressed.

  > We can avoid many unnecessary std::vector allocations by changing
  CBloomFilter to take Spans instead of std::vector's for the `insert`
  and `contains` operations.

  > CBloomFilter currently converts types such as CDataStream and uint256
  to std::vector on `insert` and `contains`. This is unnecessary because
  CDataStreams and uint256 are already std::vectors internally. We just
  need a way to point to the right data within those types. Span gives
  us this ability.

ACKs for top commit:
  sipa:
    Code review ACK a11da75
  laanwj:
    Code review ACK a11da75

Tree-SHA512: ee9ba02c9588daa1ff51782d1953fd060839dd15aa85861b2633b6ff2398320188ddd00f01d0c99442224485364ede9f8322366de4239fc7831ebfa06bd34659
@jb55 jb55 deleted the 2020-05-bloom-span branch October 5, 2021 13:39
@bitcoin bitcoin locked and limited conversation to collaborators Oct 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.