-
Notifications
You must be signed in to change notification settings - Fork 37.7k
[refactor] rewrite DisconnectedBlockTransactions to not use boost #28385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor] rewrite DisconnectedBlockTransactions to not use boost #28385
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
d16b758
to
7cd40f9
Compare
7cd40f9
to
0c356aa
Compare
0c356aa
to
17c2f91
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK, this approach with std::list
seems elegant and reduces dependency on boost.
src/validation.cpp
Outdated
@@ -2726,9 +2726,9 @@ bool Chainstate::DisconnectTip(BlockValidationState& state, DisconnectedBlockTra | |||
} | |||
while (disconnectpool->DynamicMemoryUsage() > MAX_DISCONNECTED_TX_POOL_SIZE * 1000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the dynamic memory usage of queuedTx
now changed, I think this is behaviour change as we'll be dropping entries less frequently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps (I think it's negligible compared to the transactions themselves), and would probably be an improvement.
Concept ACK - great stuff.
+1 |
3edfa0f
to
0409dbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 0409dbc
0409dbc
to
c19f4aa
Compare
@theuni you'll probably want to take a look here. |
approach ACK iiuc we're looking at constant time access to oldest element, or any via txid, with constant time deletion/insertion for arbitrary elements. This accounts for deleting the oldest when busting allotted memory for the pool, or deleting transactions found in new blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, though I'm concerned about the public queuedTx
staying in sync with the private iters_by_txid
.
For example: here queuedTx is cleared without clearing iters_by_txid. That looks like a bug to me?
src/txmempool.h
Outdated
size_t DynamicMemoryUsage() const { | ||
return memusage::MallocUsage(sizeof(CTransactionRef) + 6 * sizeof(void*)) * queuedTx.size() + cachedInnerUsage; | ||
// std::list has 3 pointers per entry | ||
return cachedInnerUsage + memusage::DynamicUsage(iters_by_txid) + memusage::MallocUsage(3 * sizeof(void*)) * queuedTx.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to add std::list
to memusage.h
instead?
src/txmempool.h
Outdated
cachedInnerUsage -= RecursiveDynamicUsage(*it); | ||
queuedTx.erase(it); | ||
for (const auto& tx : vtx) { | ||
auto iter = iters_by_txid.find(tx->GetHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could look a little cleaner using .extract()
for (const auto& tx : vtx) {
if (auto node = iters_by_txid.extract(tx->GetHash())) {
auto& list_iter = node.mapped();
cachedInnerUsage -= RecursiveDynamicUsage(*list_iter);
queuedTx.erase(list_iter);
}
}
Edit: Ignore this. I benched and performance is worse.
Really appreciate the added bench for this btw. It makes experimenting with this much easier. |
Quick POC commit which adds full encapsulation here: theuni@98c0b86 (only validation fixed up, not tests/benches). |
Oh yes, strong approach ACK for this (left a few comments on the branch), nice! |
9dcef47
to
305b14f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments and
- Dropped the unit test commit as I can see how the estimations are too rough / unclear. Maybe we can follow up with more precise assertions on the expected size in a separate PR.
- Changed the
AddAndRemoveDisconnectedBlockTransactionsAll
to have all but 1 (the coinbases) to be more realistic. This didn't change the bench results for me - still slightly faster on*90
and*All
, slightly slower on*10
.
|
||
while (!queuedTx.empty() && DynamicMemoryUsage() > m_max_mem_usage) { | ||
evicted.emplace_back(queuedTx.front()); | ||
cachedInnerUsage -= RecursiveDynamicUsage(queuedTx.front()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, changed (but in the rewrite commit instead of here)
And encapsulate underlying data structures to avoid misuse. It's better to use stdlib instead of boost when we can achieve the same thing. Behavior change: the number returned by DynamicMemoryUsage for the same transactions is higher (due to change in usage or more accurate accounting), which effectively decreases the maximum amount of transactions kept for resubmission in a reorg. Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
This struct is only used in validation + tests and has very little to do with txmempool.
…agement Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
305b14f
to
4313c77
Compare
Concept ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 4313c77
Very nice removal of boost dependency while improving the interface through encapsulation.
My final (re-)consideration is that there's a lot of new implementation code in header files (mostly kernel/disconnected_transactions.h
and txmempool.h
), which I think could be improved but also is not a blocker?
(and also to be supernitty: a fair amount of new code not using list initialization, I've made the diff of things I could see here but not worth going through rebase adventures either, so feel very free to skip/pick from this)
git diff
diff --git a/src/bench/disconnected_transactions.cpp b/src/bench/disconnected_transactions.cpp
index d6f1590950..26d97ce30b 100644
--- a/src/bench/disconnected_transactions.cpp
+++ b/src/bench/disconnected_transactions.cpp
@@ -33,8 +33,8 @@ static BlockTxns CreateRandomTransactions(size_t num_txns)
BlockTxns txns;
txns.reserve(num_txns);
// Simplest spk for every tx
- CScript spk = CScript() << OP_TRUE;
- for (uint32_t i = 0; i < num_txns; ++i) {
+ CScript spk{CScript() << OP_TRUE};
+ for (uint32_t i{0}; i < num_txns; ++i) {
CMutableTransaction tx;
tx.vin.emplace_back(CTxIn{COutPoint{prevout_hash, 0}});
tx.vout.emplace_back(CTxOut{CENT, spk});
@@ -75,7 +75,7 @@ static void Reorg(const ReorgTxns& reorg)
{
DisconnectedBlockTransactions disconnectpool{MAX_DISCONNECTED_TX_POOL_SIZE * 1000};
// Disconnect block
- const auto evicted = disconnectpool.AddTransactionsFromBlock(reorg.disconnected_txns);
+ const auto evicted{disconnectpool.AddTransactionsFromBlock(reorg.disconnected_txns)};
assert(evicted.empty());
// Connect first block
diff --git a/src/kernel/disconnected_transactions.h b/src/kernel/disconnected_transactions.h
index 7db39ba5ca..3effae8333 100644
--- a/src/kernel/disconnected_transactions.h
+++ b/src/kernel/disconnected_transactions.h
@@ -15,7 +15,7 @@
#include <vector>
/** Maximum kilobytes for transactions to store for processing during reorg */
-static const unsigned int MAX_DISCONNECTED_TX_POOL_SIZE = 20'000;
+static const unsigned int MAX_DISCONNECTED_TX_POOL_SIZE{20'000};
/**
* DisconnectedBlockTransactions
@@ -40,7 +40,7 @@ class DisconnectedBlockTransactions {
private:
/** Cached dynamic memory usage for the CTransactions (memory for the shared pointers is
* included in the container calculations). */
- uint64_t cachedInnerUsage = 0;
+ uint64_t cachedInnerUsage{0};
const size_t m_max_mem_usage;
std::list<CTransactionRef> queuedTx;
using TxList = decltype(queuedTx);
@@ -91,8 +91,8 @@ public:
[[nodiscard]] std::vector<CTransactionRef> AddTransactionsFromBlock(const std::vector<CTransactionRef>& vtx)
{
iters_by_txid.reserve(iters_by_txid.size() + vtx.size());
- for (auto block_it = vtx.rbegin(); block_it != vtx.rend(); ++block_it) {
- auto it = queuedTx.insert(queuedTx.end(), *block_it);
+ for (auto block_it{vtx.rbegin()}; block_it != vtx.rend(); ++block_it) {
+ auto it{queuedTx.insert(queuedTx.end(), *block_it)};
iters_by_txid.emplace((*block_it)->GetHash(), it);
cachedInnerUsage += RecursiveDynamicUsage(**block_it);
}
@@ -107,9 +107,9 @@ public:
return;
}
for (const auto& tx : vtx) {
- auto iter = iters_by_txid.find(tx->GetHash());
+ auto iter{iters_by_txid.find(tx->GetHash())};
if (iter != iters_by_txid.end()) {
- auto list_iter = iter->second;
+ auto list_iter{iter->second};
iters_by_txid.erase(iter);
cachedInnerUsage -= RecursiveDynamicUsage(**list_iter);
queuedTx.erase(list_iter);
@@ -129,7 +129,7 @@ public:
/** Clear all data structures and return the list of transactions. */
std::list<CTransactionRef> take()
{
- std::list<CTransactionRef> ret = std::move(queuedTx);
+ std::list<CTransactionRef> ret{std::move(queuedTx)};
clear();
return ret;
}
diff --git a/src/validation.cpp b/src/validation.cpp
index a9a0912d2a..6be0d48f14 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -300,8 +300,8 @@ void Chainstate::MaybeUpdateMempoolForReorg(
// Iterate disconnectpool in reverse, so that we add transactions
// back to the mempool starting with the earliest transaction that had
// been previously seen in a block.
- const auto queuedTx = disconnectpool.take();
- auto it = queuedTx.rbegin();
+ const auto queuedTx{disconnectpool.take()};
+ auto it{queuedTx.rbegin()};
while (it != queuedTx.rend()) {
// ignore validation errors in resurrected transactions
if (!fAddToMempool || (*it)->IsCoinBase() ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 4313c77
Nice improvements all around, no need to retouch on the nits.
@@ -148,6 +149,21 @@ static inline size_t DynamicUsage(const std::shared_ptr<X>& p) | |||
return p ? MallocUsage(sizeof(X)) + MallocUsage(sizeof(stl_shared_counter)) : 0; | |||
} | |||
|
|||
template<typename X> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (clang-format) in commit 79ce9f0: s/template</template </
- here and just below.
src/txmempool.h
Outdated
*/ | ||
class DisconnectedBlockTransactions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (clang-format) in commit 2765d6f, open brace on newline. Here and for the methods.
src/txmempool.h
Outdated
@@ -892,10 +892,18 @@ struct DisconnectedBlockTransactions { | |||
return memusage::MallocUsage(sizeof(CTransactionRef) + 6 * sizeof(void*)) * queuedTx.size() + cachedInnerUsage; | |||
} | |||
|
|||
void addTransaction(const CTransactionRef& tx) | |||
/** Add transactions from the block, iterating through vtx in reverse order. Callers should call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit in commit 925bb72: This renders with a weird indent on my terminal:
/** Add transactions from the block, iterating through vtx in reverse order. Callers should call
* this function for blocks in descending order by block height.
It is fixed a few commits later, but that line should not need to be re-touched.
|
||
/** Maximum kilobytes for transactions to store for processing during reorg */ | ||
static const unsigned int MAX_DISCONNECTED_TX_POOL_SIZE = 20'000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It is just moved here, but everywhere this is used we multiply by 1000
, so why define it in kilobytes in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you do this, how about replacing _SIZE with _BYTES ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the fact that it's kb should be documented / baked in. Going to say out of scope for this PR. Should be easy to do in a followup.
This is a nice improvement, works fine on my machine. Result of benchmark using boost multi index to store
After switching to using
Clearly there is improvement and this reduces dependency on boost. |
I'm on holiday for the next ~10 days or so. I would like to review this, but please don't let my absence hold up merge. I'll review when I'm back whether it's been merged or not. |
@stickies-v @glozow Opened up a PR that moved implementation code to |
… to not use boost
…` usage d67aa25 bench: drop NO_THREAD_SAFETY_ANALYSIS from disconnected_txs (fanquake) Pull request description: Followup to bitcoin/bitcoin#28385 (comment). ACKs for top commit: MarcoFalke: lgtm ACK d67aa25 hebasto: ACK d67aa25, tested on Ubuntu 22.04 with clang 18.0. glozow: utACK d67aa25 Tree-SHA512: a9a9a8cc70a50d4fbd51779a3203bbc2a29d354b557e8a99cfd649d5998b71ff1087f5bae7170471bed9a917a93c8f3351ae90c9a6e87d88928c35912d007b64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 4313c77.
src/txmempool.h
Outdated
queuedTx.insert(*block_it); | ||
cachedInnerUsage += RecursiveDynamicUsage(*block_it); | ||
auto it = queuedTx.insert(queuedTx.end(), *block_it); | ||
iters_by_txid.emplace((*block_it)->GetHash(), it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment: We assume that callers never pass multiple transactions with the same txid
.
IMO we should either commit to that and enforce it:
auto emplace_result = iters_by_txid.emplace((*block_it)->GetHash(), it);
assert(emplace_result.second);
or remove the restriction:
// Ignore duplicates
iters_by_txid.try_emplace((*block_it)->GetHash(), it);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I like the first option. Maybe @ismaelsadeeq might be interested in adding it to the followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to #28530
d67aa25 bench: drop NO_THREAD_SAFETY_ANALYSIS from disconnected_txs (fanquake) Pull request description: Followup to bitcoin#28385 (comment). ACKs for top commit: MarcoFalke: lgtm ACK d67aa25 hebasto: ACK d67aa25, tested on Ubuntu 22.04 with clang 18.0. glozow: utACK d67aa25 Tree-SHA512: a9a9a8cc70a50d4fbd51779a3203bbc2a29d354b557e8a99cfd649d5998b71ff1087f5bae7170471bed9a917a93c8f3351ae90c9a6e87d88928c35912d007b64
…tions rewrite followups 9b3da70 [test] DisconnectedBlockTransactions::DynamicMemoryUsage (glozow) b2d0447 bugfix: correct DisconnectedBlockTransactions memory usage (stickies-v) f4254e2 assume duplicate transactions are not added to `iters_by_txid` (ismaelsadeeq) 29eb219 move only: move implementation code to disconnected_transactions.cpp (ismaelsadeeq) 81dfedd refactor: update `MAX_DISCONNECTED_TX_POOL` from kb to bytes (ismaelsadeeq) Pull request description: This PR is a follow-up to fix review comments and a bugfix from #28385 The PR - Updated `DisconnectedBlockTransactions`'s `MAX_DISCONNECTED_TX_POOL` from kb to bytes. - Moved `DisconnectedBlockTransactions` implementation code to `kernel/disconnected_transactions.cpp`. - `AddTransactionsFromBlock` now assume duplicate transactions are not passed by asserting after inserting each transaction to `iters_by_txid`. - Included a Bug fix: In the current master we are underestimating the memory usage of `DisconnectedBlockTransactions`. * When adding and subtracting `cachedInnerUsage` we call `RecursiveDynamicUsage` with `CTransaction` which invokes this [`RecursiveDynamicUsage(const CTransaction& tx)`](https://github.com/bitcoin/bitcoin/blob/6e721c923c87abdb8d99674093d08d8c17bc52c2/src/core_memusage.h#L32) version of `RecursiveDynamicUsage`, the output of that call only account for the memory usage of the inputs and outputs of the `CTransaction`, this omits the memory usage of the `CTransaction` object and the control block. * This PR fixes this bug by calling `RecursiveDynamicUsage` with `CTransactionRef` when adding and subtracting `cachedInnerUsage` which invokes [`RecursiveDynamicUsage(const std::shared_ptr<X>& p)`](https://github.com/bitcoin/bitcoin/blob/6e721c923c87abdb8d99674093d08d8c17bc52c2/src/core_memusage.h#L67) version of `RecursiveDynamicUsage` the output of the calculation accounts for the` CTransaction` object, the control blocks, inputs and outputs memory usage. * see [comment ](bitcoin/bitcoin#28385 (comment)) - Added test for DisconnectedBlockTransactions memory limit. ACKs for top commit: stickies-v: ACK 9b3da70 - nice work! BrandonOdiwuor: re ACK 9b3da70 glozow: ACK 9b3da70 Tree-SHA512: 69b9595d09f4d0209038f97081d790cea92ccf63efb94e9e372749979fcbe527f7f17a8e454720cedd12021be0c8e11cf99874625d3dafd9ec602b12dbeb4098
…move some external mapTx access 4dd94ca [refactor] remove access to mapTx in validation_block_tests (TheCharlatan) d0cd2e8 [refactor] rewrite BlockAssembler inBlock and failedTx as sets of txids (glozow) 55b0939 scripted-diff: rename vTxHashes to txns_randomized (TheCharlatan) a03aef9 [refactor] rewrite vTxHashes as a vector of CTransactionRef (glozow) 938643c [refactor] remove access to mapTx in validation.cpp (glozow) 333367a [txmempool] make CTxMemPoolEntry::lockPoints mutable (glozow) 1bf4855 [refactor] use CheckPackageLimits for checkChainLimits (glozow) dbc5bdb [refactor] remove access to mapTx.find in mempool_tests.cpp (glozow) f80909e [refactor] remove access to mapTx in blockencodings_tests.cpp (glozow) 8892d6b [refactor] remove access to mapTx from rpc/mempool.cpp (glozow) fad61aa [refactor] get wtxid from entry instead of vTxHashes (glozow) 9cd8caf [refactor] use exists() instead of mapTx.find() (glozow) 1480469 [refactor] remove access to mapTx from policy/rbf.cpp (glozow) 1c6a73a [refactor] Add helper for retrieving mempool entry (TheCharlatan) 453b481 [refactor] Add helper for iterating through mempool entries (stickies-v) Pull request description: Motivation * It seems preferable to use stdlib data structures instead of boost if they can achieve close to the same thing. * Code external to mempool should ideally use its public helper methods instead of accessing `mapTx` or its iterators directly. * Reduce the number of complex boost multi index type interactions * Also see #28335 for further context/motivation. This PR together with #28385 simplifies that one. Overview of things done in this PR: * Make `vTxHashes` a vector of transaction references instead of a pair of transaction hash and iterator. The trade off here is that the data is retrieved on the fly with `GetEntry` instead of being cached in `vTxHashes`. * Introduce `GetEntry` helper method to replace the more involved `GetIter` where applicable * Replace `mapTx` access with `CTxMemPool` helper methods * Simplify `checkChainLimits` call in `node/interfaces.cpp` * Make `CTxMemPoolEntry`s `lockPoints`mutable such that they can be changed with a const iterator directly instead of going through `mapTx` * Make `BlockAssembler`'s `inBlock` and `failedTx` sets of transaction hashes. ACKs for top commit: glozow: reACK 4dd94ca maflcko: re-ACK 4dd94ca 👝 stickies-v: re-ACK 4dd94ca Tree-SHA512: c4d043f2186e4fde337591883fac66cade3058173987b49502bd65cecf69207a3df1077f6626809652ab63230013167b7f39a2b39f1c5166959e5495df57065f
Summary: This removes the circular dependency txmempool -> validation -> txmempool. Partial backport of [[bitcoin/bitcoin#28385 | core#28385]]. Depends on D15785. Test Plan: ninja all check-all Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Subscribers: PiRK Differential Revision: https://reviews.bitcoinabc.org/D15786
Motivation
Things done in this PR:
DisconnectedBlockTransactions
where we reorg and the new chain has {100%, 90%, 10%} of the same transactions. AFAIU in practice, it's usually close to 100%.DisconnectedBlockTransactions
as astd::list
+unordered_map
instead of a boost multi index container.DisconnectedBlockTransactions
from txmempool.h to its own kernel/disconnected_transactions.h. This struct isn't used by txmempool and doesn't have much to do with txmempool. My guess is that it's been living there for convenience since the boost includes are there.