-
Notifications
You must be signed in to change notification settings - Fork 37.7k
merkle: pre‑reserve leaves to prevent reallocs with odd vtx count #32497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
merkle: pre‑reserve leaves to prevent reallocs with odd vtx count #32497
The head ref may contain hidden characters: "l0rinc/pre\u2011reserve-merkle-leaves-to-max"
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32497. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK main commit. Not sure why the benchmark is being changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Memory allocation/deallocation is slow and expensive.
I have thought on changing the benchmark tests as we would want to test the worst case scenario right? or else we could have both case for testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Testing
master@5757de4ddd37f9321ee6b338b40888fd3561fc00
- With 9000
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 52.62 | 19,005,746.07 | 0.2% | 0.01 | `MerkleRoot`
| 52.64 | 18,998,504.40 | 0.3% | 0.01 | `MerkleRoot`
| 52.63 | 18,999,727.67 | 0.2% | 0.01 | `MerkleRoot`
- With 9001
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 53.50 | 18,693,063.88 | 0.3% | 0.01 | `MerkleRoot`
| 53.53 | 18,681,211.49 | 0.5% | 0.01 | `MerkleRoot`
| 53.49 | 18,694,053.87 | 0.5% | 0.01 | `MerkleRoot`
Commit 39b6c13
- With 9000
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 52.51 | 19,043,628.95 | 0.2% | 0.01 | `MerkleRoot`
| 52.52 | 19,040,989.96 | 0.2% | 0.01 | `MerkleRoot`
| 52.53 | 19,036,358.39 | 0.2% | 0.01 | `MerkleRoot`
- With 9001
| ns/leaf | leaf/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 53.40 | 18,713,525.67 | 0.3% | 0.01 | `MerkleRoot`
| 53.44 | 19,314,655.73 | 0.3% | 0.01 | `MerkleRoot`
| 53.41 | 18,883,462.75 | 0.3% | 0.01 | `MerkleRoot`
39b6c13
to
4cc5942
Compare
Updated the benchmark to showcase the before/after state better (resembles production code changes), by splitting out the vector copies to the unmetered lambda and having an odd number of elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 4cc5942
Testing
Before Master@9a7eece5a4
ns/leaf | leaf/s | err% | total | benchmark |
---|---|---|---|---|
53.75 | 18,605,160.70 | 0.7% | 0.01 | MerkleRoot |
53.64 | 18,643,002.35 | 0.1% | 0.01 | MerkleRoot |
53.91 | 18,548,704.52 | 0.4% | 0.01 | MerkleRoot |
After commit@4cc5942895673591de4edceb6dd0c7188c302a72
ns/leaf | leaf/s | err% | total | benchmark |
---|---|---|---|---|
57.67 | 17,340,305.39 | 0.0% | 0.52 | MerkleRoot |
57.69 | 17,332,534.96 | 0.0% | 0.52 | MerkleRoot |
58.14 | 17,199,057.10 | 0.0% | 0.52 | MerkleRoot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how meaningful this is, but it looks like there are a bunch of unrelated style changes that don't seem to have a benefit?
4cc5942
to
a1f2a4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes revert the original benchmark's behavior of including the leaf vector in the benchmark (more explicitly and reserved like the production code).
I've also changed the emplace_back
calls for the hashes to push_bash
and reverted the original loop conditions to minimize the diff in critical code sections.
Note to yuvicc: previous benchmark version was measuring something else than master (only the merkle call without the vector copy), so we can't directly compare the results.
The new results don't show any significant speed increase, only the memory allocation is more predictable - which isn't captured by the benchmark anymore.
a1f2a4c
to
a73dd45
Compare
I've rebased the changed and adjusted the benchmark to be more similar to the other production usages in the first commit, rounded to even in the second (optimization) commit, so that we can realistically measure the speed difference before and after: % build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000 Before 7f620cf:
After a73dd45:
|
a73dd45
to
be8f305
Compare
lgtm re-ACK be8f305 Before e87430e
After be8f305
|
be8f305
to
6d1950d
Compare
Rebased and added a new commit on top for deduplicating the integer rounding. Ready for review again. |
6d1950d
to
2d1a4e7
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
LGTM!
|
2d1a4e7
to
5d117d9
Compare
Edit: had to push a few variants, the fuzzing still seems to fail, so I removed the Assert/Assume from |
5d117d9
to
c162fdf
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
9f1f74a
to
a7997d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK a7997d0
Good optimization, nice catch!
src/test/fuzz/integer.cpp
Outdated
@@ -80,7 +81,8 @@ FUZZ_TARGET(integer, .init = initialize_integer) | |||
} | |||
constexpr uint256 u256_min{"0000000000000000000000000000000000000000000000000000000000000000"}; | |||
constexpr uint256 u256_max{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"}; | |||
const std::vector<uint256> v256{u256, u256_min, u256_max}; | |||
std::vector<uint256> v256{u256, u256_min, u256_max}; | |||
v256.reserve(RoundUpToEven(v256.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reserve seems to be for ComputeMerkleRoot
. However, I think it would be better to have ComputeMerkleRoot
do any reservations necessary rather than relying on the caller to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComputeMerkleRoot
already received a list of values, usually sized to fit their current size exactly (i.e. capacity()
== size()
). We can of course increase the capacity inside - it's what:
if (hashes.size() & 1) {
hashes.push_back(hashes.back());
}
does already, but if we want to avoid reallocations, we have to predict the size of the vector before we start populating it.
We could add some kind of lambda as a parameter and an expected size maybe, something like:
template <typename Gen>
uint256 GenerateMerkleRoot(std::size_t count, Gen&& gen, bool* mutated = nullptr)
{
std::vector<uint256> leaves;
leaves.reserve(RoundUpToEven(count));
for (std::size_t i{0}; i < count; ++i) {
leaves.push_back(gen(i));
}
return ComputeMerkleRoot(std::move(leaves), mutated);
}
and the production code callers would look like:
static uint256 ComputeModifiedMerkleRoot(const CMutableTransaction& cb, const CBlock& block)
{
return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return i == 0 ? cb.GetHash() : block.vtx[i]->GetHash(); });
}
uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
{
return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return block.vtx[i]->GetHash(); }, mutated);
}
uint256 BlockWitnessMerkleRoot(const CBlock& block, bool* mutated)
{
return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return i == 0 ? uint256{} : uint256(block.vtx[i]->GetWitnessHash()); }, mutated);
}
Would that be simpler in your opinion?
Patch
diff --git a/src/consensus/merkle.cpp b/src/consensus/merkle.cpp
index 8c56d979a5..a0e358b005 100644
--- a/src/consensus/merkle.cpp
+++ b/src/consensus/merkle.cpp
@@ -5,7 +5,6 @@
#include <consensus/merkle.h>
#include <hash.h>
#include <util/check.h>
-#include <util/ints.h>
/* WARNING! If you're reading this because you're learning about crypto
and/or designing a new system that will use merkle trees, keep in mind
@@ -63,26 +62,14 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
return hashes[0];
}
-
uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
{
- std::vector<uint256> leaves;
- leaves.reserve(RoundUpToEven(block.vtx.size()));
- for (size_t s = 0; s < block.vtx.size(); s++) {
- leaves.push_back(block.vtx[s]->GetHash());
- }
- return ComputeMerkleRoot(std::move(leaves), mutated);
+ return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return block.vtx[i]->GetHash(); }, mutated);
}
uint256 BlockWitnessMerkleRoot(const CBlock& block, bool* mutated)
{
- std::vector<uint256> leaves;
- leaves.reserve(RoundUpToEven(block.vtx.size()));
- leaves.emplace_back(); // The witness hash of the coinbase is 0.
- for (size_t s = 1; s < block.vtx.size(); s++) {
- leaves.push_back(block.vtx[s]->GetWitnessHash());
- }
- return ComputeMerkleRoot(std::move(leaves), mutated);
+ return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return i == 0 ? uint256{} : uint256(block.vtx[i]->GetWitnessHash()); }, mutated);
}
/* This implements a constant-space merkle root/path calculator, limited to 2^32 leaves. */
diff --git a/src/consensus/merkle.h b/src/consensus/merkle.h
index c722cbe446..8b5c14157a 100644
--- a/src/consensus/merkle.h
+++ b/src/consensus/merkle.h
@@ -9,9 +9,21 @@
#include <primitives/block.h>
#include <uint256.h>
+#include <util/ints.h>
uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated = nullptr);
+template <typename Gen>
+uint256 GenerateMerkleRoot(std::size_t count, Gen&& gen, bool* mutated = nullptr)
+{
+ std::vector<uint256> leaves;
+ leaves.reserve(RoundUpToEven(count));
+ for (std::size_t i{0}; i < count; ++i) {
+ leaves.push_back(gen(i));
+ }
+ return ComputeMerkleRoot(std::move(leaves), mutated);
+}
+
/*
* Compute the Merkle root of the transactions in a block.
* *mutated is set to true if a duplicated subtree was found.
diff --git a/src/signet.cpp b/src/signet.cpp
index 3104b7f2d7..5aea084f15 100644
--- a/src/signet.cpp
+++ b/src/signet.cpp
@@ -58,13 +58,8 @@ static bool FetchAndClearCommitmentSection(const std::span<const uint8_t> header
static uint256 ComputeModifiedMerkleRoot(const CMutableTransaction& cb, const CBlock& block)
{
- std::vector<uint256> leaves;
- leaves.reserve(RoundUpToEven(block.vtx.size()));
- leaves.push_back(cb.GetHash());
- for (size_t s = 1; s < block.vtx.size(); ++s) {
- leaves.push_back(block.vtx[s]->GetHash());
- }
- return ComputeMerkleRoot(std::move(leaves));
+ return GenerateMerkleRoot(block.vtx.size(),
+ [&](auto i) { return i ? block.vtx[i]->GetHash() : cb.GetHash(); });
}
std::optional<SignetTxs> SignetTxs::Create(const CBlock& block, const CScript& challenge)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that be simpler in your opinion?
No.
but if we want to avoid reallocations, we have to predict the size of the vector before we start populating it.
We can also avoid inserting into the vector in the first place, e.g. this diff gives me about the same speedup without needing to change anything with allocations:
diff --git a/src/consensus/merkle.cpp b/src/consensus/merkle.cpp
index 7dd24e1868f..46056bfa3e0 100644
--- a/src/consensus/merkle.cpp
+++ b/src/consensus/merkle.cpp
@@ -44,6 +44,7 @@
uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
+ std::array<uint256, 2> odd_hash;
bool mutation = false;
while (hashes.size() > 1) {
if (mutated) {
@@ -51,11 +52,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
if (hashes[pos] == hashes[pos + 1]) mutation = true;
}
}
- if (hashes.size() & 1) {
+ bool need_dup = hashes.capacity() == hashes.size() && (hashes.size() & 1);
+ if (need_dup) {
+ odd_hash[0] = hashes.back();
+ odd_hash[1] = hashes.back();
+ } else if (hashes.size() & 1) {
hashes.push_back(hashes.back());
}
SHA256D64(hashes[0].begin(), hashes[0].begin(), hashes.size() / 2);
hashes.resize(hashes.size() / 2);
+ if (need_dup) {
+ SHA256D64(odd_hash[0].begin(), odd_hash[0].begin(), 1);
+ assert(hashes.capacity() > hashes.size());
+ hashes.push_back(odd_hash[0]);
+ }
}
if (mutated) *mutated = mutation;
if (hashes.size() == 0) return uint256();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever, we can of course calculate the extra value independently, but isn't it a lot simpler to just precalculate the final size before insertions and keep ComputeMerkleRoot
unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's simpler if the caller does not have to be aware that they should allocate a larger vector to be passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda solution is even simpler in that case, the caller doesn't even have to allocate or iterate at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed it in a different way in latest push, let me know if this solves your concern or if it's orthogonal.
Instead of mutating the input after each round to avoid unwanted compiler optimizations, we assert the expected hash, which likewise inhibits aggressive optimization. To make the benchmark more similar to other `ComputeMerkleRoot` the input leaves copying is made explicit before each run. % build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000 | ns/leaf | leaf/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 45.55 | 21,953,928.57 | 0.2% | 1.13 | `MerkleRoot` | 45.55 | 21,953,985.76 | 0.1% | 1.10 | `MerkleRoot` | 45.56 | 21,950,501.83 | 0.2% | 1.10 | `MerkleRoot`
a7997d0
to
59e7320
Compare
Rebased after #33116 - the only conflict was adding Based on the suggestion of @achow101 (since I do agree that it's kinda' leaking abstractions and there's also a lot of duplication for collecting the merkle leaves), I have also deduplicated all call sites without needing to touch the consensusy Merkle calculation itself. |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
This prevents the input vector from doubling in size when the input count is odd. The rounding adds 1 when the size is odd (least significant bit is 1); otherwise it adds 0, keeping the value even. The new `ToMerkleLeaves` serves to deduplicate the collection of leaves, without leaking abstraction to the callers about the Merkle tree size evenness concerns. % build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000 | ns/leaf | leaf/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 44.57 | 22,435,971.11 | 0.3% | 1.10 | `MerkleRoot` | 44.46 | 22,491,655.23 | 0.0% | 1.10 | `MerkleRoot` | 44.47 | 22,487,556.76 | 0.1% | 1.10 | `MerkleRoot` Co-authored-by: optout21 <13562139+optout21@users.noreply.github.com> Co-authored-by: Ava Chow <github@achow101.com>
59e7320
to
de6a283
Compare
Summary
ComputeMerkleRoot
duplicates the last hash when the input size is odd.If the caller provides a
std::vector
whose capacity equals its size (common when the vector was created withresize()
), that extrapush_back
forces a reallocation, doubling its capacity.We pay this penalty on every Merkle root calculation even though we know the final size in advance.
Fix
tx
s to properly sized vectors;vtx.size()
is odd (assigned later);uint256
objects we overwrite anyway.Validation
The benchmark was updated to use an even leaf count for simplicity and to focus on hashing speed rather than reallocations or vector copying.
A full
-reindex-chainstate
up to block 896 408 ran without triggering the asserts.asserts
Asserts (not pushed in this PR) confirm that
push_back
never reallocates and that the coinbase witness hash remains null:Benchmarks
MerkleRoot
MerkleRoot