Skip to content

Conversation

ajtowns
Copy link
Contributor

@ajtowns ajtowns commented Aug 25, 2025

Reconstructing compact blocks is on the hot path for block relay, so revert changes from #28391 and #29752 that made it slower. Also add a benchmark to validate reconstruction performance, and a comment giving some background as to the approach.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 25, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33253.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK achow101, polespinasa, davidgumberg, cedwies, instagibbs
Concept ACK l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33191 (net: Provide block templates to peers on request by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ajtowns
Copy link
Contributor Author

ajtowns commented Aug 25, 2025

On my dev machine with a release build, I see a 64% performance gain on the benchmark after the reversion patches are applied.

cc @sipa @gmaxwell

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/48805674537
LLM reason (✨ experimental): The CI failure is caused by a segmentation fault in the bench_sanity_check test.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@ajtowns ajtowns force-pushed the 202508-cache-friendly-compactblock branch 3 times, most recently from 0d474c5 to decb48a Compare August 25, 2025 12:27
@fanquake fanquake requested a review from instagibbs August 25, 2025 14:30
@instagibbs
Copy link
Member

related comment: #29752 (comment)

@instagibbs
Copy link
Member

instagibbs commented Aug 25, 2025

Seems like a03aef9 is the problematic commit for my machine? Doubles the runtime.

Both reverted:

ns/op op/s err% total benchmark
1,955,234.00 511.45 2.1% 0.02 BlockEncoding

No reversions:

ns/op op/s err% total benchmark
3,945,310.00 253.47 3.0% 0.05 BlockEncoding

a8203e9 reverted only:

ns/op op/s err% total benchmark
3,929,789.00 254.47 3.6% 0.05 BlockEncoding

a03aef9 reverted only:

ns/op op/s err% total benchmark
1,964,505.00 509.03 2.8% 0.02 BlockEncoding

edit:

aj> instagibbs: re: 33253, the benchmark doesn't exercise a8203e9 -- it just passes an empty vector of extra transactions

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM64 M1 Max (2022), macOS 15.6.1

benchmark only

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,643,000.00 |              378.36 |    1.5% |      0.03 | `BlockEncoding`

benchmark with reversions (the difference is from the last commit, Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef)

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,268,791.00 |              440.76 |    1.1% |      0.03 | `BlockEncoding`

@achow101
Copy link
Member

achow101 commented Aug 25, 2025

aj> instagibbs: re: 33253, the benchmark doesn't exercise a8203e9 -- it just passes an empty vector of extra transactions

Can we bench that? Maybe as a separate benchmark instead of having this one benchmark include both.

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, excellent find.

I redid the benchmarks, got more modest changes with GCC and Clang:

C++ compiler .......................... GNU 14.2.0
echo ">  C++ compiler .......................... GNU $(gcc -dumpfullversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
228.64 ops/s before
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
4,386,276.06 227.98 0.2% 17,156,885.24 15,741,370.59 1.090 1,287,419.20 3.2% 5.40 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
4,373,623.58 228.64 0.2% 17,156,871.09 15,696,695.56 1.093 1,287,418.38 3.2% 5.41 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
4,380,919.51 228.26 0.2% 17,156,870.18 15,719,209.66 1.091 1,287,418.17 3.2% 5.40 BlockEncoding
315.60 ops/s after
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,168,586.91 315.60 0.1% 17,156,847.31 11,378,141.35 1.508 1,287,390.88 3.2% 5.29 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,241,851.00 308.47 0.2% 17,156,842.43 11,640,138.08 1.474 1,287,385.26 3.2% 5.29 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,173,445.04 315.11 0.1% 17,156,847.31 11,394,621.15 1.506 1,287,390.88 3.2% 5.28 BlockEncoding

i.e. 38% improvement with gcc


C++ compiler .......................... clang 20.1.7
echo ">  C++ compiler .......................... clang $(clang -dumpversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
252.05 ops/s before
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,967,492.38 252.05 0.1% 15,494,632.58 14,241,688.80 1.088 1,070,986.04 4.0% 5.37 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,984,996.65 250.94 0.1% 15,494,608.15 14,303,936.25 1.083 1,070,973.80 4.0% 5.38 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
4,078,221.50 245.20 0.1% 15,494,616.62 14,638,662.82 1.058 1,070,976.86 4.0% 5.37 BlockEncoding
326.06 ops/s after
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,066,879.10 326.06 0.0% 15,497,664.10 11,011,637.17 1.407 1,070,979.09 4.0% 5.28 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,135,228.99 318.96 0.1% 15,497,665.21 11,253,673.49 1.377 1,070,979.81 4.0% 5.29 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
3,127,079.54 319.79 0.1% 15,497,665.71 11,223,561.21 1.381 1,070,979.87 4.0% 5.30 BlockEncoding

i.e. 29% improvement for clang


Similarly to other people before me, I think we could include a few more scenarios in the benchmark. I left a few suggestions explaining my suggestions, but it's probably easier if I post the final benchmark code here as well:

blockencodings.cpp
// Copyright (c) 2025-present The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.

#include <bench/bench.h>
#include <blockencodings.h>
#include <consensus/amount.h>
#include <kernel/cs_main.h>
#include <primitives/transaction.h>
#include <script/script.h>
#include <sync.h>
#include <test/util/setup_common.h>
#include <test/util/txmempool.h>
#include <txmempool.h>
#include <util/check.h>

#include <memory>
#include <unordered_set>
#include <vector>
#include <algorithm>

static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs)
{
    LockPoints lp;
    AddToMempool(pool, CTxMemPoolEntry{tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp});
}

namespace {
CTransactionRef CreateTestTransaction(std::span<const std::byte> sigspam, size_t index)
{
    CMutableTransaction tx;
    tx.vin.resize(1);
    tx.vin[0].scriptSig = CScript() << sigspam << OP_1;
    tx.vin[0].scriptWitness.stack.push_back({1});
    tx.vout.resize(1);
    tx.vout[0].scriptPubKey = CScript() << OP_1 << OP_EQUAL;
    tx.vout[0].nValue = index;
    return MakeTransactionRef(tx);
}

CBlock DummyBlock()
{
    CBlock block;
    block.nVersion = 5;
    block.hashPrevBlock.SetNull();
    block.hashMerkleRoot.SetNull();
    block.nTime = 1231006505;
    block.nBits = 0x1d00ffff;
    block.nNonce = 2083236893;
    block.fChecked = false;
    CMutableTransaction tx;
    tx.vin.resize(1);
    tx.vout.resize(1);
    block.vtx.emplace_back(MakeTransactionRef(tx)); // dummy coinbase
    return block;
}

struct BenchCBHAST : CBlockHeaderAndShortTxIDs
{
    BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
    {
        std::unordered_set<uint64_t> seen;
        shorttxids.reserve(txs);
        while (txs-- > 0) {
            uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()};
            assert(seen.insert(id).second);
            shorttxids.push_back(id);
        }
        if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0);
    }
};
} // anon namespace

static void BlockEncoding(benchmark::Bench& bench)
{
    constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};

    const auto testing_setup{MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN)};
    auto& pool{*Assert(testing_setup->m_node.mempool)};

    FastRandomContext rng{/*fDeterministic=*/true};
    const auto sigspam{rng.randbytes<sigspam_size>()};

    LOCK2(cs_main, pool.cs);
    std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;

    {
        // a reasonably large mempool of 50k txs, ~10MB total
        std::vector<CTransactionRef> refs;
        refs.reserve(mempool_size);
        for (size_t i{0}; i < mempool_size; ++i) {
            refs.emplace_back(CreateTestTransaction(sigspam, i));
        }

        // ensure mempool ordering is different to memory ordering of transactions,
        // to simulate a mempool that has changed over time
        std::ranges::shuffle(refs, rng);

        extra_txn.reserve(extra_tx_count);
        for (size_t i{0}; i < extra_tx_count; ++i) {
            size_t index{i < (extra_tx_count / 2) ? mempool_size + i : mempool_size - i}; // some overlap with mempool
            auto tx{CreateTestTransaction(sigspam, index)};
            extra_txn.emplace_back(tx->GetWitnessHash(), tx);
        }

        for (const auto& tx : refs) {
            AddTx(tx, /*fee=*/tx->vout[0].nValue, pool);
        }
    }

    for (const auto duplicates : {false, true}) {
        BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates};
        for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) {
            bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : ""))
                 .run([&] {
                     PartiallyDownloadedBlock pdb{&pool};
                     const auto status{pdb.InitData(cmpctblock, extra)};
                     assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK));
                 });
        }
    }
}

BENCHMARK(BlockEncoding, benchmark::PriorityLevel::HIGH);

Edit: what is the expected memory increase because of the change?


// ensure mempool ordering is different to memory ordering of transactions,
// to simulate a mempool that has changed over time
std::shuffle(refs.begin(), refs.end(), rng);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::ranges::shuffle might be simpler:

Suggested change
std::shuffle(refs.begin(), refs.end(), rng);
std::ranges::shuffle(refs, rng);


bench.run([&] {
PartiallyDownloadedBlock pdb{&pool};
(void)pdb.InitData(cmpctblock, {});
Copy link
Contributor

@l0rinc l0rinc Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consume the result to make sure the benchmark isn't optimized away and that we're measuring the success/failure scenario correctly

Comment on lines 65 to 64
std::array<std::byte,200> sigspam;
sigspam.fill(std::byte(42));
InsecureRandomContext rng(11);
Copy link
Contributor

@l0rinc l0rinc Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively this may be slightly more descriptive than 11 and it could even be used to init the sigspam:

Suggested change
std::array<std::byte,200> sigspam;
sigspam.fill(std::byte(42));
InsecureRandomContext rng(11);
FastRandomContext rng{/*fDeterministic=*/true};
auto sigspam{rng.randbytes<200>()};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

{
const auto testing_setup = MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN);
CTxMemPool& pool = *Assert(testing_setup->m_node.mempool);
std::array<std::byte,200> sigspam;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could extract the constant used here to the front, like:

    constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};

{
shorttxids.reserve(txs);
while (txs-- > 0) {
shorttxids.push_back(rng.randbits<SHORTTXIDS_LENGTH*8>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're lucky that this doesn't generate any duplicates, but we could test those cases as well:

struct BenchCBHAST : CBlockHeaderAndShortTxIDs
{
    BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
    {
        std::unordered_set<uint64_t> seen;
        shorttxids.reserve(txs);
        while (txs-- > 0) {
            uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()};
            assert(seen.insert(id).second);
            shorttxids.push_back(id);
        }
        if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0);
    }
};

and we could bench all 4 possibilities:

for (const auto duplicates : {false, true}) {
    BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates};
    for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) {
        bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : ""))
             .run([&] {
                 PartiallyDownloadedBlock pdb{&pool};
                 const auto status{pdb.InitData(cmpctblock, extra)};
                 assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK));
             });
    }
}

resulting in something like:

ns/op op/s err% total benchmark
1,371,045.12 729.37 1.3% 5.36 reconstructions without extra
1,309,789.19 763.48 2.8% 5.35 reconstructions with extra
71,741.96 13,938.84 2.3% 5.42 reconstructions without extra (dup)
70,113.89 14,262.51 1.2% 5.50 reconstructions with extra (dup)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just testing how quickly we can iterate through all the txs in the mempool when looking at a compact block -- which is the worst case scenario that can still be fast (if we finish solving the block with transactions from the extra pool, and don't need to go back to the network). Hitting duplicates automatically puts us in the slow resolve-via-network path, so benching that shouldn't be terribly interesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure that's what we're testing, we should still assert the outcome of InitData was READ_STATUS_OK - otherwise it would be possible that we accidentally generated duplicates here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert and comment.

static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs)
{
LockPoints lp;
AddToMempool(pool, CTxMemPoolEntry(tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTxMemPoolEntry{} would indicate slightly better that it's a constructor

Copy link
Contributor

@polespinasa polespinasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Nice catch
No reversions eb1920a:

ns/op op/s err% total benchmark
2,163,341.00 462.25 7.4% 0.03 BlockEncoding

Only f2b9cbe reverted

ns/op op/s err% total benchmark
2,097,861.00 476.68 5.3% 0.03 BlockEncoding

Only decb48a reverted

ns/op op/s err% total benchmark
1,727,394.00 578.91 2.4% 0.02 BlockEncoding

Both reverted

ns/op op/s err% total benchmark
1,697,419.00 589.13 2.2% 0.02 BlockEncoding

@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
// Note that we don't want duplication between extra_txn and mempool to
// trigger this case, so we compare witness hashes first
if (txn_available[idit->second] &&
txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) {
txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be:

if (txn_available[idit->second] && 
    txn_available[idit->second]->GetWitnessHash() != extra_txn[i].first) { ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we've found a shortid match, looking up the actual tx is fine, so I don't think this warrants changing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes this section not performance critical, is that we're likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of the if check entirely, since it's checking the inverse of the if block above.

@ajtowns ajtowns force-pushed the 202508-cache-friendly-compactblock branch from decb48a to 3aa3fc8 Compare August 26, 2025 03:44
@ajtowns
Copy link
Contributor Author

ajtowns commented Aug 26, 2025

Can we bench that? Maybe as a separate benchmark instead of having this one benchmark include both.

Added benchmarks with 50k txs in mempool and 0, 100 (default), and 5000 entries in extra pool. For reference datum recommends setting the value to 1M.

@instagibbs
Copy link
Member

Benchmarks now match expectation: each commit separately improves the benchmarks ✔️

@achow101 achow101 added this to the 30.0 milestone Aug 26, 2025
@ajtowns ajtowns force-pushed the 202508-cache-friendly-compactblock branch from 3aa3fc8 to b7b249d Compare August 26, 2025 17:33
@achow101
Copy link
Member

ACK b7b249d

@DrahtBot DrahtBot requested review from polespinasa and l0rinc August 26, 2025 17:41
@ajtowns
Copy link
Contributor Author

ajtowns commented Aug 26, 2025

Edit: what is the expected memory increase because of the change?

32 bytes per mempool transaction and per extra txn (of which there are 100 by default).

A 300MB mempool might fill up with about ~100k txs (at about 440vb each on average), which would be 3.2MB total. (The 3.2MB would not be an increase overall, but would rather cause earlier eviction from the mempool, ie a ~1% reduction in capacity)

@janb84
Copy link
Contributor

janb84 commented Aug 26, 2025

I have done a review on the code and tried to understand what to look for, to improve my PR reviewing in the future.
I'm missing some experience to give "judgement" on this PR.

The PR restores code to undo some "optimisations" that had a negative impact on the performance (restores caching ) and adds some extra benchmarks.

results of benchmarking this PR (both reverted):

ns/op op/s err% total benchmark
2,546,375.00 392.72 1.9% 0.03 BlockEncodingLargeExtra
2,414,458.00 414.17 4.1% 0.03 BlockEncodingNoExtra
2,466,416.00 405.45 2.2% 0.03 BlockEncodingStdExtra

@polespinasa
Copy link
Contributor

lgtm code review and tested ACK b7b249d
nothing to add that has not already been said in other comments

w0xlt added a commit to w0xlt/bitcoin that referenced this pull request Aug 26, 2025
Copy link
Contributor

@davidgumberg davidgumberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crACK b7b249d

This PR reverts two performance regressions that broke cache locality of the wtxid's we need to hash during compact block reconstruction and it adds a nice benchmark that demonstrates the regressions and might have caught them before they were merged.

For more context: https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-08-23#1755938803-1755950270;

I reviewed the code changes here directly, and I also compared them to the commits they claimed to be reverting, and the only differences I found were comment changes, small refactor-y changes, and changes related to new code.

b9300d8 vs revert of a8203e9
DIFF_NOISE="^(diff --git|index|@@)"
REVERTING="b9300d8"
REVERTED="a8203e9"

diff --color -u\
  <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
  <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
@@ -1,9 +1,27 @@
+--- a/src/bench/blockencodings.cpp
++++ b/src/bench/blockencodings.cpp
+ 
+-    std::vector<std::pair<Wtxid, CTransactionRef>> extratxn;
++    std::vector<CTransactionRef> extratxn;
+     extratxn.reserve(n_extra);
+     for (size_t i = n_pool; i < n_pool + n_extra; ++i) {
+-        extratxn.emplace_back(refs[i]->GetWitnessHash(), refs[i]);
++        extratxn.push_back(refs[i]);
+     }
 --- a/src/blockencodings.cpp
 +++ b/src/blockencodings.cpp
  
--ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn) {
+-/* Reconstructing a compact block is in the hot-path for block relay,
+- * so we want to do it as quickly as possible. Because this often
+- * involves iterating over the entire mempool, we put all the data we
+- * need (ie the wtxid and a reference to the actual transaction data)
+- * in a vector and iterate over the vector directly. This allows optimal
+- * CPU caching behaviour, at a cost of only 40 bytes per transaction.
+- */
+-ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn)
+-{
 +ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn) {
-     if (cmpctblock.header.IsNull() || (cmpctblock.shorttxids.empty() && cmpctblock.prefilledtxn.empty()))
+     LogDebug(BCLog::CMPCTBLOCK, "Initializing PartiallyDownloadedBlock for block %s using a cmpctblock of %u bytes\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
      for (size_t i = 0; i < extra_txn.size(); i++) {
 -        uint64_t shortid = cmpctblock.GetShortID(extra_txn[i].first);
 +        if (extra_txn[i] == nullptr) {
@@ -21,8 +39,10 @@
                      txn_available[idit->second].reset();
 --- a/src/blockencodings.h
 +++ b/src/blockencodings.h
-     // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
+ 
+-    // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
 -    ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn);
++    // extra_txn is a list of extra orphan/conflicted/etc transactions to look at
 +    ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn);
      bool IsTxAvailable(size_t index) const;
 --- a/src/net_processing.cpp
@@ -38,8 +58,19 @@
 --- a/src/test/blockencodings_tests.cpp
 +++ b/src/test/blockencodings_tests.cpp
  
--std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
-+std::vector<CTransactionRef> extra_txn;
+-const std::vector<std::pair<Wtxid, CTransactionRef>> empty_extra_txn;
++const std::vector<CTransactionRef> empty_extra_txn;
+ 
+     CBlock block(BuildBlockTestCase(rand_ctx));
+-    std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
++    std::vector<CTransactionRef> extra_txn;
+     extra_txn.resize(10);
+         // Add an unrelated tx to extra_txn:
+-        extra_txn[0] = {non_block_tx->GetWitnessHash(), non_block_tx};
++        extra_txn[0] = non_block_tx;
+         // and a tx from the block that's not in the mempool:
+-        extra_txn[1] = {block.vtx[1]->GetWitnessHash(), block.vtx[1]};
++        extra_txn[1] = block.vtx[1];
  
 --- a/src/test/fuzz/partially_downloaded_block.cpp
 +++ b/src/test/fuzz/partially_downloaded_block.cpp
b7b249d vs revert of a03aef9
DIFF_NOISE="^(diff --git|index|@@)"
REVERTING="b7b249d"
REVERTED="a03aef9"

diff --color -u\
  <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
  <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
@@ -1,39 +1,48 @@
 --- a/src/blockencodings.cpp
 +++ b/src/blockencodings.cpp
      LOCK(pool->cs);
--    for (size_t i = 0; i < pool->vTxHashes.size(); i++) {
--        uint64_t shortid = cmpctblock.GetShortID(pool->vTxHashes[i].first);
-+    for (const auto& tx : pool->vTxHashes) {
+-    for (const auto& [wtxid, txit] : pool->txns_randomized) {
+-        uint64_t shortid = cmpctblock.GetShortID(wtxid);
++    for (const auto& tx : pool->txns_randomized) {
 +        uint64_t shortid = cmpctblock.GetShortID(tx->GetWitnessHash());
          std::unordered_map<uint64_t, uint16_t>::iterator idit = shorttxids.find(shortid);
              if (!have_txn[idit->second]) {
--                txn_available[idit->second] = pool->vTxHashes[i].second->GetSharedTx();
+-                txn_available[idit->second] = txit->GetSharedTx();
 +                txn_available[idit->second] = tx;
                  have_txn[idit->second]  = true;
 --- a/src/test/blockencodings_tests.cpp
 +++ b/src/test/blockencodings_tests.cpp
  // Number of shared use_counts we expect for a tx we haven't touched
--// (block + mempool + our copy from the GetSharedTx call)
+-// (block + mempool entry + our copy from the GetSharedTx call)
 -constexpr long SHARED_TX_OFFSET{3};
-+// (block + mempool entry + mempool vTxHashes + our copy from the GetSharedTx call)
++// (block + mempool entry + mempool txns_randomized + our copy from the GetSharedTx call)
 +constexpr long SHARED_TX_OFFSET{4};
  
 --- a/src/txmempool.cpp
 +++ b/src/txmempool.cpp
  
--    vTxHashes.emplace_back(tx.GetWitnessHash(), newit);
-+    vTxHashes.emplace_back(newit->GetSharedTx());
-     newit->vTxHashesIdx = vTxHashes.size() - 1;
-     if (vTxHashes.size() > 1) {
-+        // Update vTxHashesIdx of the to-be-moved entry.
-+        Assert(GetEntry(vTxHashes.back()->GetHash()))->vTxHashesIdx = it->vTxHashesIdx;
-+        // Remove entry from vTxHashes by replacing it with the back and deleting the back.
-         vTxHashes[it->vTxHashesIdx] = std::move(vTxHashes.back());
--        vTxHashes[it->vTxHashesIdx].second->vTxHashesIdx = it->vTxHashesIdx;
-         vTxHashes.pop_back();
+-    txns_randomized.emplace_back(tx.GetWitnessHash(), newit);
++    txns_randomized.emplace_back(newit->GetSharedTx());
+     newit->idx_randomized = txns_randomized.size() - 1;
+     if (txns_randomized.size() > 1) {
++        // Update idx_randomized of the to-be-moved entry.
++        Assert(GetEntry(txns_randomized.back()->GetHash()))->idx_randomized = it->idx_randomized;
+         // Remove entry from txns_randomized by replacing it with the back and deleting the back.
+         txns_randomized[it->idx_randomized] = std::move(txns_randomized.back());
+-        txns_randomized[it->idx_randomized].second->idx_randomized = it->idx_randomized;
+         txns_randomized.pop_back();
+-        if (txns_randomized.size() * 2 < txns_randomized.capacity()) {
++        if (txns_randomized.size() * 2 < txns_randomized.capacity())
+             txns_randomized.shrink_to_fit();
+-        }
+-    } else {
++    } else
+         txns_randomized.clear();
+-    }
+ 
 --- a/src/txmempool.h
 +++ b/src/txmempool.h
      using txiter = indexed_transaction_set::nth_index<0>::type::const_iterator;
--    std::vector<std::pair<uint256, txiter>> vTxHashes GUARDED_BY(cs); //!< All tx witness hashes/entries in mapTx, in random order
-+    std::vector<CTransactionRef> vTxHashes GUARDED_BY(cs); //!< All transactions in mapTx, in random order
+-    std::vector<std::pair<Wtxid, txiter>> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx with their wtxids, in arbitrary order
++    std::vector<CTransactionRef> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx, in random order
benchmark % faster
BlockEncodingLargeExtra 19.9%
BlockEncodingNoExtra 18.5%
BlockEncodingStdExtra 18.6%

% faster calculated as: $1 - \frac{\text{time}_\text{branch}}{\text{time}_\text{master}}$

Full benchmark results
$  ./build/bin/bench_bitcoin --filter="BlockEncoding.*" -min-time=5000

df5a50e (master + benchmark added)

ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
1,644,722.24 608.01 0.3% 17,098,172.55 7,050,895.83 2.425 1,138,993.60 3.8% 5.47 BlockEncodingLargeExtra
1,461,528.56 684.22 0.2% 15,650,495.41 6,266,693.24 2.497 1,064,736.70 3.5% 5.49 BlockEncodingNoExtra
1,476,456.22 677.30 0.9% 15,680,631.05 6,328,282.73 2.478 1,066,421.94 3.5% 5.55 BlockEncodingStdExtra

b9300d8 (extratxn better data locality)

ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
1,566,821.01 638.23 0.2% 17,099,295.07 6,718,281.29 2.545 1,134,042.47 3.9% 5.53 BlockEncodingLargeExtra
1,467,993.68 681.20 0.1% 15,656,135.06 6,294,675.71 2.487 1,064,516.33 3.6% 5.47 BlockEncodingNoExtra
1,454,058.97 687.73 0.5% 15,687,165.75 6,234,731.73 2.516 1,066,490.20 3.5% 5.48 BlockEncodingStdExtra

b7b249d (mempool better data locality)

ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
1,316,987.01 759.31 0.1% 17,049,029.82 5,652,300.18 3.016 1,133,975.24 3.8% 5.50 BlockEncodingLargeExtra
1,190,773.91 839.79 0.3% 15,606,374.31 5,102,241.10 3.059 1,064,614.44 3.5% 5.48 BlockEncodingNoExtra
1,201,347.37 832.40 0.2% 15,637,600.08 5,147,161.34 3.038 1,066,621.86 3.5% 5.43 BlockEncodingStdExtra

@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
// Note that we don't want duplication between extra_txn and mempool to
// trigger this case, so we compare witness hashes first
if (txn_available[idit->second] &&
txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) {
txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes this section not performance critical, is that we're likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of the if check entirely, since it's checking the inverse of the if block above.

Comment on lines 65 to 64
std::array<std::byte,200> sigspam;
sigspam.fill(std::byte(42));
InsecureRandomContext rng(11);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


BenchCBHAST cmpctblock{rng, 3000};

bench.run([&] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
bench.run([&] {
bench.unit("block").run([&] {

and maybe:

Suggested change
bench.run([&] {
bench.unit("block").timeUnit(1ms, "ms").run([&] {

@cedwies
Copy link

cedwies commented Aug 28, 2025

code-review ACK b7b249d

Read-through + local bench; results match others which I have seen. This PR restores cache locality in the compact-block reconstruction hot path and adds a well focused benchmark.

  • Reverts the “vector of CTransactionRef” refactors and restores a cache-friendly layout for reconstruction.
  • InitData again takes std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn, causing the wtxid to be available without dereferencing in the hot loop.
  • Early-exit and collision behavior are preserved.
  • Bench design is scoped to the hot path we care about and does not mix in network or dup-resolution behavior.

Strong improvement. The memory overhead is small and well justified by the speedup.

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b7b249d

non-blocking nits only


static void BlockEncodingLargeExtra(benchmark::Bench& bench)
{
BlockEncodingBench(bench, 50000, 5000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: annotate numbered args

static void BlockEncodingStdExtra(benchmark::Bench& bench)
{
static_assert(DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN == 100);
BlockEncodingBench(bench, 50000, 100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: annotate numbered args


static void BlockEncodingNoExtra(benchmark::Bench& bench)
{
BlockEncodingBench(bench, 50000, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: annotate numbered args

std::array<std::byte,200> sigspam;
sigspam.fill(std::byte(42));

// a reasonably large mempool of 50k txs, ~10MB total
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// a reasonably large mempool of 50k txs, ~10MB total
// a reasonably large mempool of 50k txs, ~10MB total plus variable extra txns

@achow101 achow101 merged commit 7cc9a08 into bitcoin:master Aug 28, 2025
19 checks passed
@l0rinc
Copy link
Contributor

l0rinc commented Aug 29, 2025

post-merge ACK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants