-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Revert compact block cache inefficiencies #33253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert compact block cache inefficiencies #33253
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33253. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
0d474c5
to
decb48a
Compare
related comment: #29752 (comment) |
Seems like a03aef9 is the problematic commit for my machine? Doubles the runtime. Both reverted:
No reversions:
a8203e9 reverted only:
a03aef9 reverted only:
edit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM64 M1 Max (2022), macOS 15.6.1
benchmark only
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,643,000.00 | 378.36 | 1.5% | 0.03 | `BlockEncoding`
benchmark with reversions (the difference is from the last commit, Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef
)
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,268,791.00 | 440.76 | 1.1% | 0.03 | `BlockEncoding`
Can we bench that? Maybe as a separate benchmark instead of having this one benchmark include both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, excellent find.
I redid the benchmarks, got more modest changes with GCC and Clang:
C++ compiler .......................... GNU 14.2.0
echo "> C++ compiler .......................... GNU $(gcc -dumpfullversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
228.64 ops/s before
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
4,386,276.06 | 227.98 | 0.2% | 17,156,885.24 | 15,741,370.59 | 1.090 | 1,287,419.20 | 3.2% | 5.40 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
4,373,623.58 | 228.64 | 0.2% | 17,156,871.09 | 15,696,695.56 | 1.093 | 1,287,418.38 | 3.2% | 5.41 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
4,380,919.51 | 228.26 | 0.2% | 17,156,870.18 | 15,719,209.66 | 1.091 | 1,287,418.17 | 3.2% | 5.40 | BlockEncoding |
315.60 ops/s after
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,168,586.91 | 315.60 | 0.1% | 17,156,847.31 | 11,378,141.35 | 1.508 | 1,287,390.88 | 3.2% | 5.29 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,241,851.00 | 308.47 | 0.2% | 17,156,842.43 | 11,640,138.08 | 1.474 | 1,287,385.26 | 3.2% | 5.29 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,173,445.04 | 315.11 | 0.1% | 17,156,847.31 | 11,394,621.15 | 1.506 | 1,287,390.88 | 3.2% | 5.28 | BlockEncoding |
i.e. 38% improvement with gcc
C++ compiler .......................... clang 20.1.7
echo "> C++ compiler .......................... clang $(clang -dumpversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
252.05 ops/s before
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,967,492.38 | 252.05 | 0.1% | 15,494,632.58 | 14,241,688.80 | 1.088 | 1,070,986.04 | 4.0% | 5.37 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,984,996.65 | 250.94 | 0.1% | 15,494,608.15 | 14,303,936.25 | 1.083 | 1,070,973.80 | 4.0% | 5.38 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
4,078,221.50 | 245.20 | 0.1% | 15,494,616.62 | 14,638,662.82 | 1.058 | 1,070,976.86 | 4.0% | 5.37 | BlockEncoding |
326.06 ops/s after
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,066,879.10 | 326.06 | 0.0% | 15,497,664.10 | 11,011,637.17 | 1.407 | 1,070,979.09 | 4.0% | 5.28 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,135,228.99 | 318.96 | 0.1% | 15,497,665.21 | 11,253,673.49 | 1.377 | 1,070,979.81 | 4.0% | 5.29 | BlockEncoding |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
3,127,079.54 | 319.79 | 0.1% | 15,497,665.71 | 11,223,561.21 | 1.381 | 1,070,979.87 | 4.0% | 5.30 | BlockEncoding |
i.e. 29% improvement for clang
Similarly to other people before me, I think we could include a few more scenarios in the benchmark. I left a few suggestions explaining my suggestions, but it's probably easier if I post the final benchmark code here as well:
blockencodings.cpp
// Copyright (c) 2025-present The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
#include <bench/bench.h>
#include <blockencodings.h>
#include <consensus/amount.h>
#include <kernel/cs_main.h>
#include <primitives/transaction.h>
#include <script/script.h>
#include <sync.h>
#include <test/util/setup_common.h>
#include <test/util/txmempool.h>
#include <txmempool.h>
#include <util/check.h>
#include <memory>
#include <unordered_set>
#include <vector>
#include <algorithm>
static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs)
{
LockPoints lp;
AddToMempool(pool, CTxMemPoolEntry{tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp});
}
namespace {
CTransactionRef CreateTestTransaction(std::span<const std::byte> sigspam, size_t index)
{
CMutableTransaction tx;
tx.vin.resize(1);
tx.vin[0].scriptSig = CScript() << sigspam << OP_1;
tx.vin[0].scriptWitness.stack.push_back({1});
tx.vout.resize(1);
tx.vout[0].scriptPubKey = CScript() << OP_1 << OP_EQUAL;
tx.vout[0].nValue = index;
return MakeTransactionRef(tx);
}
CBlock DummyBlock()
{
CBlock block;
block.nVersion = 5;
block.hashPrevBlock.SetNull();
block.hashMerkleRoot.SetNull();
block.nTime = 1231006505;
block.nBits = 0x1d00ffff;
block.nNonce = 2083236893;
block.fChecked = false;
CMutableTransaction tx;
tx.vin.resize(1);
tx.vout.resize(1);
block.vtx.emplace_back(MakeTransactionRef(tx)); // dummy coinbase
return block;
}
struct BenchCBHAST : CBlockHeaderAndShortTxIDs
{
BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
{
std::unordered_set<uint64_t> seen;
shorttxids.reserve(txs);
while (txs-- > 0) {
uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()};
assert(seen.insert(id).second);
shorttxids.push_back(id);
}
if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0);
}
};
} // anon namespace
static void BlockEncoding(benchmark::Bench& bench)
{
constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};
const auto testing_setup{MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN)};
auto& pool{*Assert(testing_setup->m_node.mempool)};
FastRandomContext rng{/*fDeterministic=*/true};
const auto sigspam{rng.randbytes<sigspam_size>()};
LOCK2(cs_main, pool.cs);
std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
{
// a reasonably large mempool of 50k txs, ~10MB total
std::vector<CTransactionRef> refs;
refs.reserve(mempool_size);
for (size_t i{0}; i < mempool_size; ++i) {
refs.emplace_back(CreateTestTransaction(sigspam, i));
}
// ensure mempool ordering is different to memory ordering of transactions,
// to simulate a mempool that has changed over time
std::ranges::shuffle(refs, rng);
extra_txn.reserve(extra_tx_count);
for (size_t i{0}; i < extra_tx_count; ++i) {
size_t index{i < (extra_tx_count / 2) ? mempool_size + i : mempool_size - i}; // some overlap with mempool
auto tx{CreateTestTransaction(sigspam, index)};
extra_txn.emplace_back(tx->GetWitnessHash(), tx);
}
for (const auto& tx : refs) {
AddTx(tx, /*fee=*/tx->vout[0].nValue, pool);
}
}
for (const auto duplicates : {false, true}) {
BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates};
for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) {
bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : ""))
.run([&] {
PartiallyDownloadedBlock pdb{&pool};
const auto status{pdb.InitData(cmpctblock, extra)};
assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK));
});
}
}
}
BENCHMARK(BlockEncoding, benchmark::PriorityLevel::HIGH);
Edit: what is the expected memory increase because of the change?
src/bench/blockencodings.cpp
Outdated
|
||
// ensure mempool ordering is different to memory ordering of transactions, | ||
// to simulate a mempool that has changed over time | ||
std::shuffle(refs.begin(), refs.end(), rng); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: std::ranges::shuffle
might be simpler:
std::shuffle(refs.begin(), refs.end(), rng); | |
std::ranges::shuffle(refs, rng); |
src/bench/blockencodings.cpp
Outdated
|
||
bench.run([&] { | ||
PartiallyDownloadedBlock pdb{&pool}; | ||
(void)pdb.InitData(cmpctblock, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could consume the result to make sure the benchmark isn't optimized away and that we're measuring the success/failure scenario correctly
src/bench/blockencodings.cpp
Outdated
std::array<std::byte,200> sigspam; | ||
sigspam.fill(std::byte(42)); | ||
InsecureRandomContext rng(11); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively this may be slightly more descriptive than 11
and it could even be used to init the sigspam
:
std::array<std::byte,200> sigspam; | |
sigspam.fill(std::byte(42)); | |
InsecureRandomContext rng(11); | |
FastRandomContext rng{/*fDeterministic=*/true}; | |
auto sigspam{rng.randbytes<200>()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
src/bench/blockencodings.cpp
Outdated
{ | ||
const auto testing_setup = MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN); | ||
CTxMemPool& pool = *Assert(testing_setup->m_node.mempool); | ||
std::array<std::byte,200> sigspam; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could extract the constant used here to the front, like:
constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};
{ | ||
shorttxids.reserve(txs); | ||
while (txs-- > 0) { | ||
shorttxids.push_back(rng.randbits<SHORTTXIDS_LENGTH*8>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're lucky that this doesn't generate any duplicates, but we could test those cases as well:
struct BenchCBHAST : CBlockHeaderAndShortTxIDs
{
BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
{
std::unordered_set<uint64_t> seen;
shorttxids.reserve(txs);
while (txs-- > 0) {
uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()};
assert(seen.insert(id).second);
shorttxids.push_back(id);
}
if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0);
}
};
and we could bench all 4 possibilities:
for (const auto duplicates : {false, true}) {
BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates};
for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) {
bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : ""))
.run([&] {
PartiallyDownloadedBlock pdb{&pool};
const auto status{pdb.InitData(cmpctblock, extra)};
assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK));
});
}
}
resulting in something like:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
1,371,045.12 | 729.37 | 1.3% | 5.36 | reconstructions without extra |
1,309,789.19 | 763.48 | 2.8% | 5.35 | reconstructions with extra |
71,741.96 | 13,938.84 | 2.3% | 5.42 | reconstructions without extra (dup) |
70,113.89 | 14,262.51 | 1.2% | 5.50 | reconstructions with extra (dup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just testing how quickly we can iterate through all the txs in the mempool when looking at a compact block -- which is the worst case scenario that can still be fast (if we finish solving the block with transactions from the extra pool, and don't need to go back to the network). Hitting duplicates automatically puts us in the slow resolve-via-network path, so benching that shouldn't be terribly interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure that's what we're testing, we should still assert the outcome of InitData
was READ_STATUS_OK
- otherwise it would be possible that we accidentally generated duplicates here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an assert and comment.
static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs) | ||
{ | ||
LockPoints lp; | ||
AddToMempool(pool, CTxMemPoolEntry(tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CTxMemPoolEntry{}
would indicate slightly better that it's a constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Nice catch
No reversions eb1920a:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
2,163,341.00 | 462.25 | 7.4% | 0.03 | BlockEncoding |
Only f2b9cbe reverted
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
2,097,861.00 | 476.68 | 5.3% | 0.03 | BlockEncoding |
Only decb48a reverted
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
1,727,394.00 | 578.91 | 2.4% | 0.02 | BlockEncoding |
Both reverted
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
1,697,419.00 | 589.13 | 2.2% | 0.02 | BlockEncoding |
@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c | |||
// Note that we don't want duplication between extra_txn and mempool to | |||
// trigger this case, so we compare witness hashes first | |||
if (txn_available[idit->second] && | |||
txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) { | |||
txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be:
if (txn_available[idit->second] &&
txn_available[idit->second]->GetWitnessHash() != extra_txn[i].first) { ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we've found a shortid match, looking up the actual tx is fine, so I don't think this warrants changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes this section not performance critical, is that we're likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of the if
check entirely, since it's checking the inverse of the if
block above.
decb48a
to
3aa3fc8
Compare
Added benchmarks with 50k txs in mempool and 0, 100 (default), and 5000 entries in extra pool. For reference datum recommends setting the value to 1M. |
Benchmarks now match expectation: each commit separately improves the benchmarks ✔️ |
3aa3fc8
to
b7b249d
Compare
ACK b7b249d |
32 bytes per mempool transaction and per extra txn (of which there are 100 by default). A 300MB mempool might fill up with about ~100k txs (at about 440vb each on average), which would be 3.2MB total. (The 3.2MB would not be an increase overall, but would rather cause earlier eviction from the mempool, ie a ~1% reduction in capacity) |
I have done a review on the code and tried to understand what to look for, to improve my PR reviewing in the future. The PR restores code to undo some "optimisations" that had a negative impact on the performance (restores caching ) and adds some extra benchmarks. results of benchmarking this PR (both reverted):
|
lgtm code review and tested ACK b7b249d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crACK b7b249d
This PR reverts two performance regressions that broke cache locality of the wtxid's we need to hash during compact block reconstruction and it adds a nice benchmark that demonstrates the regressions and might have caught them before they were merged.
For more context: https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-08-23#1755938803-1755950270;
I reviewed the code changes here directly, and I also compared them to the commits they claimed to be reverting, and the only differences I found were comment changes, small refactor-y changes, and changes related to new code.
b9300d8 vs revert of a8203e9
DIFF_NOISE="^(diff --git|index|@@)"
REVERTING="b9300d8"
REVERTED="a8203e9"
diff --color -u\
<(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
<(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
@@ -1,9 +1,27 @@
+--- a/src/bench/blockencodings.cpp
++++ b/src/bench/blockencodings.cpp
+
+- std::vector<std::pair<Wtxid, CTransactionRef>> extratxn;
++ std::vector<CTransactionRef> extratxn;
+ extratxn.reserve(n_extra);
+ for (size_t i = n_pool; i < n_pool + n_extra; ++i) {
+- extratxn.emplace_back(refs[i]->GetWitnessHash(), refs[i]);
++ extratxn.push_back(refs[i]);
+ }
--- a/src/blockencodings.cpp
+++ b/src/blockencodings.cpp
--ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn) {
+-/* Reconstructing a compact block is in the hot-path for block relay,
+- * so we want to do it as quickly as possible. Because this often
+- * involves iterating over the entire mempool, we put all the data we
+- * need (ie the wtxid and a reference to the actual transaction data)
+- * in a vector and iterate over the vector directly. This allows optimal
+- * CPU caching behaviour, at a cost of only 40 bytes per transaction.
+- */
+-ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn)
+-{
+ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn) {
- if (cmpctblock.header.IsNull() || (cmpctblock.shorttxids.empty() && cmpctblock.prefilledtxn.empty()))
+ LogDebug(BCLog::CMPCTBLOCK, "Initializing PartiallyDownloadedBlock for block %s using a cmpctblock of %u bytes\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
for (size_t i = 0; i < extra_txn.size(); i++) {
- uint64_t shortid = cmpctblock.GetShortID(extra_txn[i].first);
+ if (extra_txn[i] == nullptr) {
@@ -21,8 +39,10 @@
txn_available[idit->second].reset();
--- a/src/blockencodings.h
+++ b/src/blockencodings.h
- // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
+
+- // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
- ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn);
++ // extra_txn is a list of extra orphan/conflicted/etc transactions to look at
+ ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn);
bool IsTxAvailable(size_t index) const;
--- a/src/net_processing.cpp
@@ -38,8 +58,19 @@
--- a/src/test/blockencodings_tests.cpp
+++ b/src/test/blockencodings_tests.cpp
--std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
-+std::vector<CTransactionRef> extra_txn;
+-const std::vector<std::pair<Wtxid, CTransactionRef>> empty_extra_txn;
++const std::vector<CTransactionRef> empty_extra_txn;
+
+ CBlock block(BuildBlockTestCase(rand_ctx));
+- std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
++ std::vector<CTransactionRef> extra_txn;
+ extra_txn.resize(10);
+ // Add an unrelated tx to extra_txn:
+- extra_txn[0] = {non_block_tx->GetWitnessHash(), non_block_tx};
++ extra_txn[0] = non_block_tx;
+ // and a tx from the block that's not in the mempool:
+- extra_txn[1] = {block.vtx[1]->GetWitnessHash(), block.vtx[1]};
++ extra_txn[1] = block.vtx[1];
--- a/src/test/fuzz/partially_downloaded_block.cpp
+++ b/src/test/fuzz/partially_downloaded_block.cpp
b7b249d vs revert of a03aef9
DIFF_NOISE="^(diff --git|index|@@)"
REVERTING="b7b249d"
REVERTED="a03aef9"
diff --color -u\
<(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
<(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
@@ -1,39 +1,48 @@
--- a/src/blockencodings.cpp
+++ b/src/blockencodings.cpp
LOCK(pool->cs);
-- for (size_t i = 0; i < pool->vTxHashes.size(); i++) {
-- uint64_t shortid = cmpctblock.GetShortID(pool->vTxHashes[i].first);
-+ for (const auto& tx : pool->vTxHashes) {
+- for (const auto& [wtxid, txit] : pool->txns_randomized) {
+- uint64_t shortid = cmpctblock.GetShortID(wtxid);
++ for (const auto& tx : pool->txns_randomized) {
+ uint64_t shortid = cmpctblock.GetShortID(tx->GetWitnessHash());
std::unordered_map<uint64_t, uint16_t>::iterator idit = shorttxids.find(shortid);
if (!have_txn[idit->second]) {
-- txn_available[idit->second] = pool->vTxHashes[i].second->GetSharedTx();
+- txn_available[idit->second] = txit->GetSharedTx();
+ txn_available[idit->second] = tx;
have_txn[idit->second] = true;
--- a/src/test/blockencodings_tests.cpp
+++ b/src/test/blockencodings_tests.cpp
// Number of shared use_counts we expect for a tx we haven't touched
--// (block + mempool + our copy from the GetSharedTx call)
+-// (block + mempool entry + our copy from the GetSharedTx call)
-constexpr long SHARED_TX_OFFSET{3};
-+// (block + mempool entry + mempool vTxHashes + our copy from the GetSharedTx call)
++// (block + mempool entry + mempool txns_randomized + our copy from the GetSharedTx call)
+constexpr long SHARED_TX_OFFSET{4};
--- a/src/txmempool.cpp
+++ b/src/txmempool.cpp
-- vTxHashes.emplace_back(tx.GetWitnessHash(), newit);
-+ vTxHashes.emplace_back(newit->GetSharedTx());
- newit->vTxHashesIdx = vTxHashes.size() - 1;
- if (vTxHashes.size() > 1) {
-+ // Update vTxHashesIdx of the to-be-moved entry.
-+ Assert(GetEntry(vTxHashes.back()->GetHash()))->vTxHashesIdx = it->vTxHashesIdx;
-+ // Remove entry from vTxHashes by replacing it with the back and deleting the back.
- vTxHashes[it->vTxHashesIdx] = std::move(vTxHashes.back());
-- vTxHashes[it->vTxHashesIdx].second->vTxHashesIdx = it->vTxHashesIdx;
- vTxHashes.pop_back();
+- txns_randomized.emplace_back(tx.GetWitnessHash(), newit);
++ txns_randomized.emplace_back(newit->GetSharedTx());
+ newit->idx_randomized = txns_randomized.size() - 1;
+ if (txns_randomized.size() > 1) {
++ // Update idx_randomized of the to-be-moved entry.
++ Assert(GetEntry(txns_randomized.back()->GetHash()))->idx_randomized = it->idx_randomized;
+ // Remove entry from txns_randomized by replacing it with the back and deleting the back.
+ txns_randomized[it->idx_randomized] = std::move(txns_randomized.back());
+- txns_randomized[it->idx_randomized].second->idx_randomized = it->idx_randomized;
+ txns_randomized.pop_back();
+- if (txns_randomized.size() * 2 < txns_randomized.capacity()) {
++ if (txns_randomized.size() * 2 < txns_randomized.capacity())
+ txns_randomized.shrink_to_fit();
+- }
+- } else {
++ } else
+ txns_randomized.clear();
+- }
+
--- a/src/txmempool.h
+++ b/src/txmempool.h
using txiter = indexed_transaction_set::nth_index<0>::type::const_iterator;
-- std::vector<std::pair<uint256, txiter>> vTxHashes GUARDED_BY(cs); //!< All tx witness hashes/entries in mapTx, in random order
-+ std::vector<CTransactionRef> vTxHashes GUARDED_BY(cs); //!< All transactions in mapTx, in random order
+- std::vector<std::pair<Wtxid, txiter>> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx with their wtxids, in arbitrary order
++ std::vector<CTransactionRef> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx, in random order
benchmark | % faster |
---|---|
BlockEncodingLargeExtra |
19.9% |
BlockEncodingNoExtra |
18.5% |
BlockEncodingStdExtra |
18.6% |
% faster calculated as:
$1 - \frac{\text{time}_\text{branch}}{\text{time}_\text{master}}$
Full benchmark results
$ ./build/bin/bench_bitcoin --filter="BlockEncoding.*" -min-time=5000
df5a50e (master + benchmark added)
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
1,644,722.24 | 608.01 | 0.3% | 17,098,172.55 | 7,050,895.83 | 2.425 | 1,138,993.60 | 3.8% | 5.47 | BlockEncodingLargeExtra |
1,461,528.56 | 684.22 | 0.2% | 15,650,495.41 | 6,266,693.24 | 2.497 | 1,064,736.70 | 3.5% | 5.49 | BlockEncodingNoExtra |
1,476,456.22 | 677.30 | 0.9% | 15,680,631.05 | 6,328,282.73 | 2.478 | 1,066,421.94 | 3.5% | 5.55 | BlockEncodingStdExtra |
b9300d8 (extratxn better data locality)
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
1,566,821.01 | 638.23 | 0.2% | 17,099,295.07 | 6,718,281.29 | 2.545 | 1,134,042.47 | 3.9% | 5.53 | BlockEncodingLargeExtra |
1,467,993.68 | 681.20 | 0.1% | 15,656,135.06 | 6,294,675.71 | 2.487 | 1,064,516.33 | 3.6% | 5.47 | BlockEncodingNoExtra |
1,454,058.97 | 687.73 | 0.5% | 15,687,165.75 | 6,234,731.73 | 2.516 | 1,066,490.20 | 3.5% | 5.48 | BlockEncodingStdExtra |
b7b249d (mempool better data locality)
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
1,316,987.01 | 759.31 | 0.1% | 17,049,029.82 | 5,652,300.18 | 3.016 | 1,133,975.24 | 3.8% | 5.50 | BlockEncodingLargeExtra |
1,190,773.91 | 839.79 | 0.3% | 15,606,374.31 | 5,102,241.10 | 3.059 | 1,064,614.44 | 3.5% | 5.48 | BlockEncodingNoExtra |
1,201,347.37 | 832.40 | 0.2% | 15,637,600.08 | 5,147,161.34 | 3.038 | 1,066,621.86 | 3.5% | 5.43 | BlockEncodingStdExtra |
@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c | |||
// Note that we don't want duplication between extra_txn and mempool to | |||
// trigger this case, so we compare witness hashes first | |||
if (txn_available[idit->second] && | |||
txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) { | |||
txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes this section not performance critical, is that we're likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of the if
check entirely, since it's checking the inverse of the if
block above.
src/bench/blockencodings.cpp
Outdated
std::array<std::byte,200> sigspam; | ||
sigspam.fill(std::byte(42)); | ||
InsecureRandomContext rng(11); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
BenchCBHAST cmpctblock{rng, 3000}; | ||
|
||
bench.run([&] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
bench.run([&] { | |
bench.unit("block").run([&] { |
and maybe:
bench.run([&] { | |
bench.unit("block").timeUnit(1ms, "ms").run([&] { |
code-review ACK b7b249d Read-through + local bench; results match others which I have seen. This PR restores cache locality in the compact-block reconstruction hot path and adds a well focused benchmark.
Strong improvement. The memory overhead is small and well justified by the speedup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b7b249d
non-blocking nits only
|
||
static void BlockEncodingLargeExtra(benchmark::Bench& bench) | ||
{ | ||
BlockEncodingBench(bench, 50000, 5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro-nit: annotate numbered args
static void BlockEncodingStdExtra(benchmark::Bench& bench) | ||
{ | ||
static_assert(DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN == 100); | ||
BlockEncodingBench(bench, 50000, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro-nit: annotate numbered args
|
||
static void BlockEncodingNoExtra(benchmark::Bench& bench) | ||
{ | ||
BlockEncodingBench(bench, 50000, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro-nit: annotate numbered args
std::array<std::byte,200> sigspam; | ||
sigspam.fill(std::byte(42)); | ||
|
||
// a reasonably large mempool of 50k txs, ~10MB total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// a reasonably large mempool of 50k txs, ~10MB total | |
// a reasonably large mempool of 50k txs, ~10MB total plus variable extra txns |
post-merge ACK |
Reconstructing compact blocks is on the hot path for block relay, so revert changes from #28391 and #29752 that made it slower. Also add a benchmark to validate reconstruction performance, and a comment giving some background as to the approach.