Revert compact block cache inefficiencies #33253

ajtowns · 2025-08-25T10:03:37Z

Reconstructing compact blocks is on the hot path for block relay, so revert changes from #28391 and #29752 that made it slower. Also add a benchmark to validate reconstruction performance, and a comment giving some background as to the approach.

DrahtBot · 2025-08-25T10:03:43Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33253.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	achow101, polespinasa, davidgumberg, cedwies, instagibbs
Concept ACK	l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33191 (net: Provide block templates to peers on request by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ajtowns · 2025-08-25T10:05:06Z

On my dev machine with a release build, I see a 64% performance gain on the benchmark after the reversion patches are applied.

cc @sipa @gmaxwell

DrahtBot · 2025-08-25T10:29:16Z

🚧 At least one of the CI tasks failed.
_{Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/48805674537}
_{LLM reason (✨ experimental): The CI failure is caused by a segmentation fault in the bench_sanity_check test.}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

instagibbs · 2025-08-25T14:41:28Z

related comment: #29752 (comment)

instagibbs · 2025-08-25T14:59:00Z

Seems like a03aef9 is the problematic commit for my machine? Doubles the runtime.

Both reverted:

ns/op	op/s	err%	total	benchmark
1,955,234.00	511.45	2.1%	0.02	`BlockEncoding`

No reversions:

ns/op	op/s	err%	total	benchmark
3,945,310.00	253.47	3.0%	0.05	`BlockEncoding`

a8203e9 reverted only:

ns/op	op/s	err%	total	benchmark
3,929,789.00	254.47	3.6%	0.05	`BlockEncoding`

a03aef9 reverted only:

ns/op	op/s	err%	total	benchmark
1,964,505.00	509.03	2.8%	0.02	`BlockEncoding`

edit:

aj> instagibbs: re: 33253, the benchmark doesn't exercise a8203e9 -- it just passes an empty vector of extra transactions

jonatack

ARM64 M1 Max (2022), macOS 15.6.1

benchmark only

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,643,000.00 |              378.36 |    1.5% |      0.03 | `BlockEncoding`

benchmark with reversions (the difference is from the last commit, Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef)

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,268,791.00 |              440.76 |    1.1% |      0.03 | `BlockEncoding`

achow101 · 2025-08-25T19:44:39Z

aj> instagibbs: re: 33253, the benchmark doesn't exercise a8203e9 -- it just passes an empty vector of extra transactions

Can we bench that? Maybe as a separate benchmark instead of having this one benchmark include both.

l0rinc

Concept ACK, excellent find.

I redid the benchmarks, got more modest changes with GCC and Clang:

C++ compiler .......................... GNU 14.2.0

echo ">  C++ compiler .......................... GNU $(gcc -dumpfullversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000

228.64 ops/s before

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
4,386,276.06	227.98	0.2%	17,156,885.24	15,741,370.59	1.090	1,287,419.20	3.2%	5.40	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
4,373,623.58	228.64	0.2%	17,156,871.09	15,696,695.56	1.093	1,287,418.38	3.2%	5.41	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
4,380,919.51	228.26	0.2%	17,156,870.18	15,719,209.66	1.091	1,287,418.17	3.2%	5.40	`BlockEncoding`

315.60 ops/s after

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,168,586.91	315.60	0.1%	17,156,847.31	11,378,141.35	1.508	1,287,390.88	3.2%	5.29	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,241,851.00	308.47	0.2%	17,156,842.43	11,640,138.08	1.474	1,287,385.26	3.2%	5.29	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,173,445.04	315.11	0.1%	17,156,847.31	11,394,621.15	1.506	1,287,390.88	3.2%	5.28	`BlockEncoding`

i.e. 38% improvement with gcc

C++ compiler .......................... clang 20.1.7

echo ">  C++ compiler .......................... clang $(clang -dumpversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000

252.05 ops/s before

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,967,492.38	252.05	0.1%	15,494,632.58	14,241,688.80	1.088	1,070,986.04	4.0%	5.37	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,984,996.65	250.94	0.1%	15,494,608.15	14,303,936.25	1.083	1,070,973.80	4.0%	5.38	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
4,078,221.50	245.20	0.1%	15,494,616.62	14,638,662.82	1.058	1,070,976.86	4.0%	5.37	`BlockEncoding`

326.06 ops/s after

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,066,879.10	326.06	0.0%	15,497,664.10	11,011,637.17	1.407	1,070,979.09	4.0%	5.28	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,135,228.99	318.96	0.1%	15,497,665.21	11,253,673.49	1.377	1,070,979.81	4.0%	5.29	`BlockEncoding`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
3,127,079.54	319.79	0.1%	15,497,665.71	11,223,561.21	1.381	1,070,979.87	4.0%	5.30	`BlockEncoding`

i.e. 29% improvement for clang

Similarly to other people before me, I think we could include a few more scenarios in the benchmark. I left a few suggestions explaining my suggestions, but it's probably easier if I post the final benchmark code here as well:

blockencodings.cpp

// Copyright (c) 2025-present The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.

#include <bench/bench.h>
#include <blockencodings.h>
#include <consensus/amount.h>
#include <kernel/cs_main.h>
#include <primitives/transaction.h>
#include <script/script.h>
#include <sync.h>
#include <test/util/setup_common.h>
#include <test/util/txmempool.h>
#include <txmempool.h>
#include <util/check.h>

#include <memory>
#include <unordered_set>
#include <vector>
#include <algorithm>

static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs)
{
    LockPoints lp;
    AddToMempool(pool, CTxMemPoolEntry{tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp});
}

namespace {
CTransactionRef CreateTestTransaction(std::span<const std::byte> sigspam, size_t index)
{
    CMutableTransaction tx;
    tx.vin.resize(1);
    tx.vin[0].scriptSig = CScript() << sigspam << OP_1;
    tx.vin[0].scriptWitness.stack.push_back({1});
    tx.vout.resize(1);
    tx.vout[0].scriptPubKey = CScript() << OP_1 << OP_EQUAL;
    tx.vout[0].nValue = index;
    return MakeTransactionRef(tx);
}

CBlock DummyBlock()
{
    CBlock block;
    block.nVersion = 5;
    block.hashPrevBlock.SetNull();
    block.hashMerkleRoot.SetNull();
    block.nTime = 1231006505;
    block.nBits = 0x1d00ffff;
    block.nNonce = 2083236893;
    block.fChecked = false;
    CMutableTransaction tx;
    tx.vin.resize(1);
    tx.vout.resize(1);
    block.vtx.emplace_back(MakeTransactionRef(tx)); // dummy coinbase
    return block;
}

struct BenchCBHAST : CBlockHeaderAndShortTxIDs
{
    BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
    {
        std::unordered_set<uint64_t> seen;
        shorttxids.reserve(txs);
        while (txs-- > 0) {
            uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()};
            assert(seen.insert(id).second);
            shorttxids.push_back(id);
        }
        if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0);
    }
};
} // anon namespace

static void BlockEncoding(benchmark::Bench& bench)
{
    constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};

    const auto testing_setup{MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN)};
    auto& pool{*Assert(testing_setup->m_node.mempool)};

    FastRandomContext rng{/*fDeterministic=*/true};
    const auto sigspam{rng.randbytes<sigspam_size>()};

    LOCK2(cs_main, pool.cs);
    std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;

    {
        // a reasonably large mempool of 50k txs, ~10MB total
        std::vector<CTransactionRef> refs;
        refs.reserve(mempool_size);
        for (size_t i{0}; i < mempool_size; ++i) {
            refs.emplace_back(CreateTestTransaction(sigspam, i));
        }

        // ensure mempool ordering is different to memory ordering of transactions,
        // to simulate a mempool that has changed over time
        std::ranges::shuffle(refs, rng);

        extra_txn.reserve(extra_tx_count);
        for (size_t i{0}; i < extra_tx_count; ++i) {
            size_t index{i < (extra_tx_count / 2) ? mempool_size + i : mempool_size - i}; // some overlap with mempool
            auto tx{CreateTestTransaction(sigspam, index)};
            extra_txn.emplace_back(tx->GetWitnessHash(), tx);
        }

        for (const auto& tx : refs) {
            AddTx(tx, /*fee=*/tx->vout[0].nValue, pool);
        }
    }

    for (const auto duplicates : {false, true}) {
        BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates};
        for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) {
            bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : ""))
                 .run([&] {
                     PartiallyDownloadedBlock pdb{&pool};
                     const auto status{pdb.InitData(cmpctblock, extra)};
                     assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK));
                 });
        }
    }
}

BENCHMARK(BlockEncoding, benchmark::PriorityLevel::HIGH);

Edit: what is the expected memory increase because of the change?

src/bench/blockencodings.cpp

l0rinc · 2025-08-25T18:26:13Z

src/bench/blockencodings.cpp

+
+        // ensure mempool ordering is different to memory ordering of transactions,
+        // to simulate a mempool that has changed over time
+        std::shuffle(refs.begin(), refs.end(), rng);


nit: std::ranges::shuffle might be simpler:

Suggested change

std::shuffle(refs.begin(), refs.end(), rng);

std::ranges::shuffle(refs, rng);

l0rinc · 2025-08-25T18:27:33Z

src/bench/blockencodings.cpp

+
+    bench.run([&] {
+        PartiallyDownloadedBlock pdb{&pool};
+        (void)pdb.InitData(cmpctblock, {});


we could consume the result to make sure the benchmark isn't optimized away and that we're measuring the success/failure scenario correctly

src/bench/blockencodings.cpp

l0rinc · 2025-08-25T18:32:39Z

src/bench/blockencodings.cpp

+    std::array<std::byte,200> sigspam;
+    sigspam.fill(std::byte(42));
+    InsecureRandomContext rng(11);


alternatively this may be slightly more descriptive than 11 and it could even be used to init the sigspam:

Suggested change

std::array<std::byte,200> sigspam;

sigspam.fill(std::byte(42));

InsecureRandomContext rng(11);

FastRandomContext rng{/*fDeterministic=*/true};

auto sigspam{rng.randbytes<200>()};

l0rinc · 2025-08-25T20:32:48Z

src/bench/blockencodings.cpp

+{
+    const auto testing_setup = MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN);
+    CTxMemPool& pool = *Assert(testing_setup->m_node.mempool);
+    std::array<std::byte,200> sigspam;


We could extract the constant used here to the front, like:

constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};

l0rinc · 2025-08-25T20:52:01Z

src/bench/blockencodings.cpp

+    {
+        shorttxids.reserve(txs);
+        while (txs-- > 0) {
+            shorttxids.push_back(rng.randbits<SHORTTXIDS_LENGTH*8>());


we're lucky that this doesn't generate any duplicates, but we could test those cases as well:

struct BenchCBHAST : CBlockHeaderAndShortTxIDs { BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64()) { std::unordered_set<uint64_t> seen; shorttxids.reserve(txs); while (txs-- > 0) { uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()}; assert(seen.insert(id).second); shorttxids.push_back(id); } if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0); } };

and we could bench all 4 possibilities:

for (const auto duplicates : {false, true}) { BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates}; for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) { bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : "")) .run([&] { PartiallyDownloadedBlock pdb{&pool}; const auto status{pdb.InitData(cmpctblock, extra)}; assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK)); }); } }

resulting in something like:

ns/op op/s err% total benchmark

1,371,045.12 729.37 1.3% 5.36 reconstructions without extra

1,309,789.19 763.48 2.8% 5.35 reconstructions with extra

71,741.96 13,938.84 2.3% 5.42 reconstructions without extra (dup)

70,113.89 14,262.51 1.2% 5.50 reconstructions with extra (dup)

This is just testing how quickly we can iterate through all the txs in the mempool when looking at a compact block -- which is the worst case scenario that can still be fast (if we finish solving the block with transactions from the extra pool, and don't need to go back to the network). Hitting duplicates automatically puts us in the slow resolve-via-network path, so benching that shouldn't be terribly interesting.

To make sure that's what we're testing, we should still assert the outcome of InitData was READ_STATUS_OK - otherwise it would be possible that we accidentally generated duplicates here.

Added an assert and comment.

src/bench/blockencodings.cpp

l0rinc · 2025-08-25T21:08:03Z

src/bench/blockencodings.cpp

+static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs)
+{
+    LockPoints lp;
+    AddToMempool(pool, CTxMemPoolEntry(tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp));


CTxMemPoolEntry{} would indicate slightly better that it's a constructor

polespinasa

Concept ACK

Nice catch
No reversions eb1920a:

ns/op	op/s	err%	total	benchmark
2,163,341.00	462.25	7.4%	0.03	`BlockEncoding`

Only f2b9cbe reverted

ns/op	op/s	err%	total	benchmark
2,097,861.00	476.68	5.3%	0.03	`BlockEncoding`

Only decb48a reverted

ns/op	op/s	err%	total	benchmark
1,727,394.00	578.91	2.4%	0.02	`BlockEncoding`

Both reverted

ns/op	op/s	err%	total	benchmark
1,697,419.00	589.13	2.2%	0.02	`BlockEncoding`

polespinasa · 2025-08-25T22:21:28Z

src/blockencodings.cpp

@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
                // Note that we don't want duplication between extra_txn and mempool to
                // trigger this case, so we compare witness hashes first
                if (txn_available[idit->second] &&
-                        txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) {
+                        txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) {


I think this should be:

if (txn_available[idit->second] && txn_available[idit->second]->GetWitnessHash() != extra_txn[i].first) { ...

Once we've found a shortid match, looking up the actual tx is fine, so I don't think this warrants changing.

What makes this section not performance critical, is that we're likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of the if check entirely, since it's checking the inverse of the if block above.

ajtowns · 2025-08-26T04:06:15Z

Can we bench that? Maybe as a separate benchmark instead of having this one benchmark include both.

Added benchmarks with 50k txs in mempool and 0, 100 (default), and 5000 entries in extra pool. For reference datum recommends setting the value to 1M.

instagibbs · 2025-08-26T16:05:24Z

Benchmarks now match expectation: each commit separately improves the benchmarks ✔️

… instead of a vec of pair<Wtxid, CTransactionRef>" This reverts commit a8203e9.

This reverts commit a03aef9.

achow101 · 2025-08-26T17:41:09Z

ACK b7b249d

ajtowns · 2025-08-26T17:43:41Z

Edit: what is the expected memory increase because of the change?

32 bytes per mempool transaction and per extra txn (of which there are 100 by default).

A 300MB mempool might fill up with about ~100k txs (at about 440vb each on average), which would be 3.2MB total. (The 3.2MB would not be an increase overall, but would rather cause earlier eviction from the mempool, ie a ~1% reduction in capacity)

janb84 · 2025-08-26T18:16:11Z

I have done a review on the code and tried to understand what to look for, to improve my PR reviewing in the future.
I'm missing some experience to give "judgement" on this PR.

The PR restores code to undo some "optimisations" that had a negative impact on the performance (restores caching ) and adds some extra benchmarks.

results of benchmarking this PR (both reverted):

ns/op	op/s	err%	total	benchmark
2,546,375.00	392.72	1.9%	0.03	`BlockEncodingLargeExtra`
2,414,458.00	414.17	4.1%	0.03	`BlockEncodingNoExtra`
2,466,416.00	405.45	2.2%	0.03	`BlockEncodingStdExtra`

polespinasa · 2025-08-26T18:35:38Z

lgtm code review and tested ACK b7b249d
nothing to add that has not already been said in other comments

davidgumberg

crACK b7b249d

This PR reverts two performance regressions that broke cache locality of the wtxid's we need to hash during compact block reconstruction and it adds a nice benchmark that demonstrates the regressions and might have caught them before they were merged.

For more context: https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-08-23#1755938803-1755950270;

I reviewed the code changes here directly, and I also compared them to the commits they claimed to be reverting, and the only differences I found were comment changes, small refactor-y changes, and changes related to new code.

b9300d8 vs revert of a8203e9

DIFF_NOISE="^(diff --git|index|@@)"
REVERTING="b9300d8"
REVERTED="a8203e9"

diff --color -u\
  <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
  <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")

@@ -1,9 +1,27 @@
+--- a/src/bench/blockencodings.cpp
++++ b/src/bench/blockencodings.cpp
+ 
+-    std::vector<std::pair<Wtxid, CTransactionRef>> extratxn;
++    std::vector<CTransactionRef> extratxn;
+     extratxn.reserve(n_extra);
+     for (size_t i = n_pool; i < n_pool + n_extra; ++i) {
+-        extratxn.emplace_back(refs[i]->GetWitnessHash(), refs[i]);
++        extratxn.push_back(refs[i]);
+     }
 --- a/src/blockencodings.cpp
 +++ b/src/blockencodings.cpp
  
--ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn) {
+-/* Reconstructing a compact block is in the hot-path for block relay,
+- * so we want to do it as quickly as possible. Because this often
+- * involves iterating over the entire mempool, we put all the data we
+- * need (ie the wtxid and a reference to the actual transaction data)
+- * in a vector and iterate over the vector directly. This allows optimal
+- * CPU caching behaviour, at a cost of only 40 bytes per transaction.
+- */
+-ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn)
+-{
 +ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn) {
-     if (cmpctblock.header.IsNull() || (cmpctblock.shorttxids.empty() && cmpctblock.prefilledtxn.empty()))
+     LogDebug(BCLog::CMPCTBLOCK, "Initializing PartiallyDownloadedBlock for block %s using a cmpctblock of %u bytes\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
      for (size_t i = 0; i < extra_txn.size(); i++) {
 -        uint64_t shortid = cmpctblock.GetShortID(extra_txn[i].first);
 +        if (extra_txn[i] == nullptr) {
@@ -21,8 +39,10 @@
                      txn_available[idit->second].reset();
 --- a/src/blockencodings.h
 +++ b/src/blockencodings.h
-     // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
+ 
+-    // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
 -    ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn);
++    // extra_txn is a list of extra orphan/conflicted/etc transactions to look at
 +    ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn);
      bool IsTxAvailable(size_t index) const;
 --- a/src/net_processing.cpp
@@ -38,8 +58,19 @@
 --- a/src/test/blockencodings_tests.cpp
 +++ b/src/test/blockencodings_tests.cpp
  
--std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
-+std::vector<CTransactionRef> extra_txn;
+-const std::vector<std::pair<Wtxid, CTransactionRef>> empty_extra_txn;
++const std::vector<CTransactionRef> empty_extra_txn;
+ 
+     CBlock block(BuildBlockTestCase(rand_ctx));
+-    std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
++    std::vector<CTransactionRef> extra_txn;
+     extra_txn.resize(10);
+         // Add an unrelated tx to extra_txn:
+-        extra_txn[0] = {non_block_tx->GetWitnessHash(), non_block_tx};
++        extra_txn[0] = non_block_tx;
+         // and a tx from the block that's not in the mempool:
+-        extra_txn[1] = {block.vtx[1]->GetWitnessHash(), block.vtx[1]};
++        extra_txn[1] = block.vtx[1];
  
 --- a/src/test/fuzz/partially_downloaded_block.cpp
 +++ b/src/test/fuzz/partially_downloaded_block.cpp

b7b249d vs revert of a03aef9

DIFF_NOISE="^(diff --git|index|@@)"
REVERTING="b7b249d"
REVERTED="a03aef9"

diff --color -u\
  <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
  <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")

@@ -1,39 +1,48 @@
 --- a/src/blockencodings.cpp
 +++ b/src/blockencodings.cpp
      LOCK(pool->cs);
--    for (size_t i = 0; i < pool->vTxHashes.size(); i++) {
--        uint64_t shortid = cmpctblock.GetShortID(pool->vTxHashes[i].first);
-+    for (const auto& tx : pool->vTxHashes) {
+-    for (const auto& [wtxid, txit] : pool->txns_randomized) {
+-        uint64_t shortid = cmpctblock.GetShortID(wtxid);
++    for (const auto& tx : pool->txns_randomized) {
 +        uint64_t shortid = cmpctblock.GetShortID(tx->GetWitnessHash());
          std::unordered_map<uint64_t, uint16_t>::iterator idit = shorttxids.find(shortid);
              if (!have_txn[idit->second]) {
--                txn_available[idit->second] = pool->vTxHashes[i].second->GetSharedTx();
+-                txn_available[idit->second] = txit->GetSharedTx();
 +                txn_available[idit->second] = tx;
                  have_txn[idit->second]  = true;
 --- a/src/test/blockencodings_tests.cpp
 +++ b/src/test/blockencodings_tests.cpp
  // Number of shared use_counts we expect for a tx we haven't touched
--// (block + mempool + our copy from the GetSharedTx call)
+-// (block + mempool entry + our copy from the GetSharedTx call)
 -constexpr long SHARED_TX_OFFSET{3};
-+// (block + mempool entry + mempool vTxHashes + our copy from the GetSharedTx call)
++// (block + mempool entry + mempool txns_randomized + our copy from the GetSharedTx call)
 +constexpr long SHARED_TX_OFFSET{4};
  
 --- a/src/txmempool.cpp
 +++ b/src/txmempool.cpp
  
--    vTxHashes.emplace_back(tx.GetWitnessHash(), newit);
-+    vTxHashes.emplace_back(newit->GetSharedTx());
-     newit->vTxHashesIdx = vTxHashes.size() - 1;
-     if (vTxHashes.size() > 1) {
-+        // Update vTxHashesIdx of the to-be-moved entry.
-+        Assert(GetEntry(vTxHashes.back()->GetHash()))->vTxHashesIdx = it->vTxHashesIdx;
-+        // Remove entry from vTxHashes by replacing it with the back and deleting the back.
-         vTxHashes[it->vTxHashesIdx] = std::move(vTxHashes.back());
--        vTxHashes[it->vTxHashesIdx].second->vTxHashesIdx = it->vTxHashesIdx;
-         vTxHashes.pop_back();
+-    txns_randomized.emplace_back(tx.GetWitnessHash(), newit);
++    txns_randomized.emplace_back(newit->GetSharedTx());
+     newit->idx_randomized = txns_randomized.size() - 1;
+     if (txns_randomized.size() > 1) {
++        // Update idx_randomized of the to-be-moved entry.
++        Assert(GetEntry(txns_randomized.back()->GetHash()))->idx_randomized = it->idx_randomized;
+         // Remove entry from txns_randomized by replacing it with the back and deleting the back.
+         txns_randomized[it->idx_randomized] = std::move(txns_randomized.back());
+-        txns_randomized[it->idx_randomized].second->idx_randomized = it->idx_randomized;
+         txns_randomized.pop_back();
+-        if (txns_randomized.size() * 2 < txns_randomized.capacity()) {
++        if (txns_randomized.size() * 2 < txns_randomized.capacity())
+             txns_randomized.shrink_to_fit();
+-        }
+-    } else {
++    } else
+         txns_randomized.clear();
+-    }
+ 
 --- a/src/txmempool.h
 +++ b/src/txmempool.h
      using txiter = indexed_transaction_set::nth_index<0>::type::const_iterator;
--    std::vector<std::pair<uint256, txiter>> vTxHashes GUARDED_BY(cs); //!< All tx witness hashes/entries in mapTx, in random order
-+    std::vector<CTransactionRef> vTxHashes GUARDED_BY(cs); //!< All transactions in mapTx, in random order
+-    std::vector<std::pair<Wtxid, txiter>> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx with their wtxids, in arbitrary order
++    std::vector<CTransactionRef> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx, in random order

benchmark	% faster
`BlockEncodingLargeExtra`	19.9%
`BlockEncodingNoExtra`	18.5%
`BlockEncodingStdExtra`	18.6%

% faster calculated as: $1 - \frac{\text{time}_\text{branch}}{\text{time}_\text{master}}$

Full benchmark results

$  ./build/bin/bench_bitcoin --filter="BlockEncoding.*" -min-time=5000

df5a50e (master + benchmark added)

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1,644,722.24	608.01	0.3%	17,098,172.55	7,050,895.83	2.425	1,138,993.60	3.8%	5.47	`BlockEncodingLargeExtra`
1,461,528.56	684.22	0.2%	15,650,495.41	6,266,693.24	2.497	1,064,736.70	3.5%	5.49	`BlockEncodingNoExtra`
1,476,456.22	677.30	0.9%	15,680,631.05	6,328,282.73	2.478	1,066,421.94	3.5%	5.55	`BlockEncodingStdExtra`

b9300d8 (extratxn better data locality)

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1,566,821.01	638.23	0.2%	17,099,295.07	6,718,281.29	2.545	1,134,042.47	3.9%	5.53	`BlockEncodingLargeExtra`
1,467,993.68	681.20	0.1%	15,656,135.06	6,294,675.71	2.487	1,064,516.33	3.6%	5.47	`BlockEncodingNoExtra`
1,454,058.97	687.73	0.5%	15,687,165.75	6,234,731.73	2.516	1,066,490.20	3.5%	5.48	`BlockEncodingStdExtra`

b7b249d (mempool better data locality)

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1,316,987.01	759.31	0.1%	17,049,029.82	5,652,300.18	3.016	1,133,975.24	3.8%	5.50	`BlockEncodingLargeExtra`
1,190,773.91	839.79	0.3%	15,606,374.31	5,102,241.10	3.059	1,064,614.44	3.5%	5.48	`BlockEncodingNoExtra`
1,201,347.37	832.40	0.2%	15,637,600.08	5,147,161.34	3.038	1,066,621.86	3.5%	5.43	`BlockEncodingStdExtra`

davidgumberg · 2025-08-26T20:16:22Z

src/blockencodings.cpp

@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
                // Note that we don't want duplication between extra_txn and mempool to
                // trigger this case, so we compare witness hashes first
                if (txn_available[idit->second] &&
-                        txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) {
+                        txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) {


What makes this section not performance critical, is that we're likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of the if check entirely, since it's checking the inverse of the if block above.

davidgumberg · 2025-08-26T21:38:04Z

src/bench/blockencodings.cpp

+    std::array<std::byte,200> sigspam;
+    sigspam.fill(std::byte(42));
+    InsecureRandomContext rng(11);


davidgumberg · 2025-08-26T23:24:43Z

src/bench/blockencodings.cpp

+
+    BenchCBHAST cmpctblock{rng, 3000};
+
+    bench.run([&] {


nit:

Suggested change

bench.run([&] {

bench.unit("block").run([&] {

and maybe:

Suggested change

bench.run([&] {

bench.unit("block").timeUnit(1ms, "ms").run([&] {

cedwies · 2025-08-28T12:09:10Z

code-review ACK b7b249d

Read-through + local bench; results match others which I have seen. This PR restores cache locality in the compact-block reconstruction hot path and adds a well focused benchmark.

Reverts the “vector of CTransactionRef” refactors and restores a cache-friendly layout for reconstruction.
InitData again takes std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn, causing the wtxid to be available without dereferencing in the hot loop.
Early-exit and collision behavior are preserved.
Bench design is scoped to the hot path we care about and does not mix in network or dup-resolution behavior.

Strong improvement. The memory overhead is small and well justified by the speedup.

instagibbs

ACK b7b249d

non-blocking nits only

instagibbs · 2025-08-28T13:21:29Z

src/bench/blockencodings.cpp

+
+static void BlockEncodingLargeExtra(benchmark::Bench& bench)
+{
+    BlockEncodingBench(bench, 50000, 5000);


micro-nit: annotate numbered args

instagibbs · 2025-08-28T13:21:34Z

src/bench/blockencodings.cpp

+static void BlockEncodingStdExtra(benchmark::Bench& bench)
+{
+    static_assert(DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN == 100);
+    BlockEncodingBench(bench, 50000, 100);


micro-nit: annotate numbered args

instagibbs · 2025-08-28T13:21:37Z

src/bench/blockencodings.cpp

+
+static void BlockEncodingNoExtra(benchmark::Bench& bench)
+{
+    BlockEncodingBench(bench, 50000, 0);


micro-nit: annotate numbered args

instagibbs · 2025-08-28T13:25:51Z

src/bench/blockencodings.cpp

+    std::array<std::byte,200> sigspam;
+    sigspam.fill(std::byte(42));
+
+    // a reasonably large mempool of 50k txs, ~10MB total


Suggested change

// a reasonably large mempool of 50k txs, ~10MB total

// a reasonably large mempool of 50k txs, ~10MB total plus variable extra txns

l0rinc · 2025-08-29T00:04:24Z

post-merge ACK

DrahtBot added the CI failed label Aug 25, 2025

ajtowns force-pushed the 202508-cache-friendly-compactblock branch 3 times, most recently from 0d474c5 to decb48a Compare August 25, 2025 12:27

DrahtBot removed the CI failed label Aug 25, 2025

fanquake requested a review from instagibbs August 25, 2025 14:30

jonatack reviewed Aug 25, 2025

View reviewed changes

l0rinc approved these changes Aug 25, 2025

View reviewed changes

polespinasa reviewed Aug 25, 2025

View reviewed changes

ajtowns force-pushed the 202508-cache-friendly-compactblock branch from decb48a to 3aa3fc8 Compare August 26, 2025 03:44

DrahtBot mentioned this pull request Aug 26, 2025

net: Provide block templates to peers on request #33191

Open

achow101 added this to the 30.0 milestone Aug 26, 2025

ajtowns added 3 commits August 27, 2025 03:33

bench/blockencodings: add compact block reconstruction benchmark

df5a50e

Revert "refactor: Simplify extra_txn to be a vec of CTransactionRef…

b9300d8

… instead of a vec of pair<Wtxid, CTransactionRef>" This reverts commit a8203e9.

Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef"

b7b249d

This reverts commit a03aef9.

ajtowns force-pushed the 202508-cache-friendly-compactblock branch from 3aa3fc8 to b7b249d Compare August 26, 2025 17:33

DrahtBot requested review from polespinasa and l0rinc August 26, 2025 17:41

w0xlt added a commit to w0xlt/bitcoin that referenced this pull request Aug 26, 2025

Rebase on top of PR bitcoin#33253

b548801

davidgumberg approved these changes Aug 27, 2025

View reviewed changes

instagibbs approved these changes Aug 28, 2025

View reviewed changes

achow101 merged commit 7cc9a08 into bitcoin:master Aug 28, 2025
19 checks passed

	std::shuffle(refs.begin(), refs.end(), rng);
	std::ranges::shuffle(refs, rng);

ns/op	op/s	err%	total	benchmark
1,371,045.12	729.37	1.3%	5.36	`reconstructions without extra`
1,309,789.19	763.48	2.8%	5.35	`reconstructions with extra`
71,741.96	13,938.84	2.3%	5.42	`reconstructions without extra (dup)`
70,113.89	14,262.51	1.2%	5.50	`reconstructions with extra (dup)`

	bench.run([&] {
	bench.unit("block").timeUnit(1ms, "ms").run([&] {

	// a reasonably large mempool of 50k txs, ~10MB total
	// a reasonably large mempool of 50k txs, ~10MB total plus variable extra txns

Revert compact block cache inefficiencies #33253

Revert compact block cache inefficiencies #33253

Uh oh!

Conversation

ajtowns commented Aug 25, 2025

Uh oh!

DrahtBot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

ajtowns commented Aug 25, 2025

Uh oh!

DrahtBot commented Aug 25, 2025

Uh oh!

instagibbs commented Aug 25, 2025

Uh oh!

instagibbs commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

achow101 commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

l0rinc Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polespinasa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajtowns commented Aug 26, 2025

Uh oh!

instagibbs commented Aug 26, 2025

Uh oh!

achow101 commented Aug 26, 2025

Uh oh!

ajtowns commented Aug 26, 2025

Uh oh!

janb84 commented Aug 26, 2025

Uh oh!

polespinasa commented Aug 26, 2025

DrahtBot commented Aug 25, 2025 •

edited

Loading

instagibbs commented Aug 25, 2025 •

edited

Loading

jonatack left a comment •

edited

Loading

achow101 commented Aug 25, 2025 •

edited

Loading

l0rinc left a comment •

edited

Loading

l0rinc Aug 25, 2025 •

edited

Loading

l0rinc Aug 25, 2025 •

edited

Loading