-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Remove CTxMempool::mapLinks data structure member #19478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove CTxMempool::mapLinks data structure member #19478
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
This is cool! How much memory would be saved by this change on average, do you think? |
std::sets are awful for memory. I think you can get like a max of 220k txns in the mempool... let's assume that average is 50k? I don't know if that's a good assumption though. the overhead for a map entry is very high. I would guess that it's somewhere around 64 bytes per entry? Can double check, but based on:
That's 4 bytes + 4 bytes alignment + 8 * 3 bytes pointers + 8 bytes txiter (ignoring txlinks itself since we still store those in the CTxMempoolEntry, but adding another 8 bytes for alignments and padding) = 6 * 8 = 48 byes of raw overhead. Then with mallocusage, this gives us 64 bytes. So it may not use 64 bytes per element, but we certainly count that many! Depending on how big txlinks is itself, we may have some extra overhead (implementation dependent). Anyways, let's assume 64 bytes. 64*50,000 bytes = 3.2 MB, 64 * 200,000 bytes = 12.8 MB. For a 300MB pool, 1-3% memory savings. Note that this doesn't account for the fact that there are other downsides to the memory fragmentation of doing so many allocations, but a pooled allocator could also help with this issue. (A further change which is maybe worth looking at is the Children themselves -- because there is so much memory overhead, storing a std::vector and std::vector might be more efficient for many txs with small numbers of vins/vouts, especially since you can super optimize checking if a H is in a vector v.s. log N random memory compares) |
template <size_t s1, size_t s2> struct static_check {
static_check() {
static_assert(s1 == s2);
}
};
auto s = static_check<48, sizeof(setEntries)>();
auto s2 = static_check<48*2, sizeof(TxLinks)>();
auto s3 = static_check<136, sizeof(memusage::stl_tree_node<std::pair<const txiter, TxLinks>>)>();
auto s4 = static_check<160, ((31 + sizeof(memusage::stl_tree_node<std::pair<const txiter, TxLinks>>)) >> 4)<<4 >(); This confirms that a setEntries uses 48 bytes for the container itself, and the overhead is 64 bytes per node for each TxLinks in the mapLinks data structure compared to inlined TxLinks in CTxMempoolEntry. Even though the nodes themselves are only 136 bytes, the MallocUsage accounting pads out to 160 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good. I'm unsure what our plans are with respect to Boost features.
ACK dc0d3fb code review and tested rebased on master. The benchmarks show a 20% improvement for me, built with --enable-bench CXXFLAGS="-O2"
.
Current master 476436b
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 50, 1, 22.3, 0.371758, 0.665461, 0.426088
After, rebased on master
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 50, 1, 18.2, 0.337176, 0.493058, 0.354405
With 5000 transactions
src/bench/mempool_stress.cpp
- for (auto x = 0; x < 800 && !available_coins.empty(); ++x) {
+ for (auto x = 0; x < 5000 && !available_coins.empty(); ++x) {
Current master 476436b
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 97.3971, 18.5328, 20.4063, 19.4885
After, rebased on master
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 80.3426, 15.6254, 16.5093, 16.118
nit: maybe a less violent PR title 🪓 🙂 👍 (joking, but s/kill/remove/ works) and add the component prefix |
Not bad :) The description even required translation. #9260 (comment) |
One run each of all benchmarks (though I suspect there is a fair amount of variation between runs). all benchmarks, master vs this branch rebased on master
current master
this branch, rebased to master
|
@@ -172,16 +172,17 @@ bool CTxMemPool::CalculateMemPoolAncestors(const CTxMemPoolEntry &entry, setEntr | |||
// If we're not searching for parents, we require this to be an | |||
// entry in the mempool already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we say "we require" in this comment, previously we had an assertion that this was true, this patch removes this assertion. This should be safe to do, as it would have previously resulted in crashes. But if we want the assertion checked, we should add back a lookup to mapTX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because previously we had to consistency check constantly to make sure we weren't looking at a dead mempool entry https://github.com/bitcoin/bitcoin/pull/19478/files#diff-ca74c4b28865382863b8fe7633a85cd6L987
Maybe one way to get around this would be to make CTxMemPoolEntry contain bool "dead" if the entry has been evicted but a valid reference is dangling somewhere? This is unconditionally a bug, but then we could add asserts back to CTxMempoolEntry::GetMempool{Children,Parents}{,Const}.
I think this can be handled in follow up work though, as the major need for this consistency check was a fraught ownership model in mapLinks.
@jonatack thanks for the review! Cool to see it making an even bigger impact on other people's systems -- can you share your hardware specs? Mine were run on AMD Ryzen 7 1800X Eight-Core, 3.6ghz, 32 gb ram 2133 MT/s. It generally makes sense that if you're on lesser hardware (more total time on your benches) there may be some other factors contributing to a bigger speedup. W.r.t. boost features, iterator_to is a member function of multiindex, so it's not a new feature it's an existing API. In any case, it has been in use in the mempool since 2015 on line 174 60de0d5, so it should be fine. All nits have been addressed. |
Intel i7-6500U 4 core @ 2.5GHz, 32 gb RAM DDR4 2666, SSD nvme Samsung 960 Pro 1T, running Debian bullseye/sid with Coreboot (I need to upgrade to a faster CPU but it's not likely to happen soon). Code review re-ACK 2a47af4 per |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Tested 2a47af4 on Linux Mint 20 (x86_64).
-
The first commit c977ecc "Part 1/2..." is not compiled. I think we should avoid such situations in the git history.
-
I strictly prefer that new code follows the coding style, therefore mind applying clang-format-diff.py to it?
09832f5
to
74a780a
Compare
Ok I've gone ahead and squashed all the nits and intermittent commits so that bisecting should work cleanly. I think the resultant history is tad harder to review than with the two parts, as there was a bigger delineation from the architectural refactoring, the callsite changes, and just style changes to touched code, but it is what it is. @jonatack you'll notice some formatting changes when comparing to the prior commit. diff --git a/src/txmempool.h b/src/txmempool.h
index 8050eaefa..a3a0197d9 100644
--- a/src/txmempool.h
+++ b/src/txmempool.h
@@ -50,13 +50,16 @@ struct LockPoints
};
struct CompareIteratorByHashGeneric {
- // SFINAE for a is a pointer or a reference
- template<typename T>
- bool operator()(const std::reference_wrapper<T> &a, const std::reference_wrapper<T> &b) const {
+ // SFINAE for T where T is either a pointer type (e.g., a txiter) or a reference_wrapper<T>
+ // (e.g. a wrapped CTxMemPoolEntry&)
+ template <typename T>
+ bool operator()(const std::reference_wrapper<T>& a, const std::reference_wrapper<T>& b) const
+ {
return a.get().GetTx().GetHash() < b.get().GetTx().GetHash();
}
- template<typename T>
- bool operator()(const T &a, const T &b) const {
+ template <typename T>
+ bool operator()(const T& a, const T& b) const
+ {
return a->GetTx().GetHash() < b->GetTx().GetHash();
}
};
@@ -79,6 +82,7 @@ public:
// two aliases, should the types ever diverge
typedef std::set<CTxMemPoolEntryRef, CompareIteratorByHashGeneric> Parents;
typedef std::set<CTxMemPoolEntryRef, CompareIteratorByHashGeneric> Children;
+
private:
const CTransactionRef tx;
mutable Parents m_parents;
@@ -145,10 +149,10 @@ public:
CAmount GetModFeesWithAncestors() const { return nModFeesWithAncestors; }
int64_t GetSigOpCostWithAncestors() const { return nSigOpCostWithAncestors; }
- const Parents& GetMemPoolParentsConst() const {return m_parents;}
- const Children& GetMemPoolChildrenConst() const {return m_children;}
- Parents& GetMemPoolParents() const {return m_parents;}
- Children& GetMemPoolChildren() const {return m_children;}
+ const Parents& GetMemPoolParentsConst() const { return m_parents; }
+ const Children& GetMemPoolChildrenConst() const { return m_children; }
+ Parents& GetMemPoolParents() const { return m_parents; }
+ Children& GetMemPoolChildren() const { return m_children; }
mutable size_t vTxHashesIdx; //!< Index in mempool's vTxHashes
mutable uint64_t m_epoch; //!< epoch when last touched, useful for graph algorithms
@@ -547,7 +551,6 @@ public:
std::vector<std::pair<uint256, txiter>> vTxHashes GUARDED_BY(cs); //!< All tx witness hashes/entries in mapTx, in random order
typedef std::set<txiter, CompareIteratorByHashGeneric> setEntries;
- typedef std::vector<txiter> vecEntries;
uint64_t CalculateDescendantMaximum(txiter entry) const EXCLUSIVE_LOCKS_REQUIRED(cs);
private:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 74a780a, tested on Linux Mint 20 (x86_64).
Benchmarking results:
- master (2aaff48)
$ ./src/bench/bench_bitcoin -filter="ComplexMemPool"
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.42105, 0.283276, 0.286269, 0.283499
- this PR (74a780a)
$ ./src/bench/bench_bitcoin -filter="ComplexMemPool"
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.21549, 0.240644, 0.246652, 0.243024
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3ba1665
to
296be8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 296be8f per |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason the commits aren't squashed? I left some style nits, but feel free to ignore the stupid ones. Though, it would be good to fixup the documentation ones. Also those:
git grep --extended-regexp '(setMemPool|mapLinks)'
Also checked that the dynamic size doesn't increase:
diff --git a/src/bench/mempool_stress.cpp b/src/bench/mempool_stress.cpp
index 89233e390c..7b07f7723e 100644
--- a/src/bench/mempool_stress.cpp
+++ b/src/bench/mempool_stress.cpp
@@ -86,6 +86,8 @@ static void ComplexMemPool(benchmark::Bench& bench)
for (auto& tx : ordered_coins) {
AddTx(tx, pool);
}
+ std::cout << pool.DynamicMemoryUsage() << std::endl;
+ assert(false);
pool.TrimToSize(pool.DynamicMemoryUsage() * 3 / 4);
pool.TrimToSize(GetVirtualTransactionSize(*ordered_coins.front()));
});
Before: 3499728
After: 3442128
a76734e4471002f0643bef86fbbea9a2da187f4e
@@ -55,45 +55,45 @@ size_t CTxMemPoolEntry::GetTxSize() const | |||
} | |||
|
|||
// Update the given tx for any in-mempool descendants. | |||
// Assumes that setMemPoolChildren is correct for the given tx and all | |||
// Assumes that CTxMemPool::m_children is correct for the given tx and all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Assumes that CTxMemPool::m_children is correct for the given tx and all | |
// Assumes that CTxMemPoolEntry::m_children is correct for the given tx and all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch on the m_children.
@@ -119,7 +119,7 @@ void CTxMemPool::UpdateTransactionsFromBlock(const std::vector<uint256> &vHashes | |||
// Iterate in reverse, so that whenever we are looking at a transaction | |||
// we are sure that all in-mempool descendants have already been processed. | |||
// This maximizes the benefit of the descendant cache and guarantees that | |||
// setMemPoolChildren will be updated, an assumption made in | |||
// CTxMemPool::m_children will be updated, an assumption made in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -128,8 +128,8 @@ void CTxMemPool::UpdateTransactionsFromBlock(const std::vector<uint256> &vHashes | |||
continue; | |||
} | |||
auto iter = mapNextTx.lower_bound(COutPoint(hash, 0)); | |||
// First calculate the children, and update setMemPoolChildren to | |||
// include them, and update their setMemPoolParents to include this tx. | |||
// First calculate the children, and update CTxMemPool::m_children to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
// Here we only update statistics and not data in mapLinks (which | ||
// we need to preserve until we're finished with all operations that | ||
// need to traverse the mempool). | ||
// Here we only update statistics and not data in CTxMemPool::Parents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@MarcoFalke do you have a preference for how some of these are handled? I think because it's all non-functional style related or documentary concerns it would be OK to do post-merge as there's a lot of other pending work that will touch/redo these areas eventually in any case. as to why it is 2 commits and not one, it used to be more commits, and then I slimmed it down. It previously made more sense as the prior patches were something like: 1) Introduce new API 2) Use New API 3) Delete old API, but 1 + 2 got squashed by request since there were a couple issues with the API changing the underlying return type making patch 1+2 fail to compile (even though readable). I don't care strongly, but have a mild preference for not burning out the extant reviewers. |
No preference on how those are handled, but re-ACKing some doc fixups should cost as much (or as little) review effort as creating a new commit/pull and getting that reviewed, no? |
No because there's already outstanding work (I have like something in the range of 40-70 other commits staged since last year for the mempool project) that is going to touch these areas and cause me to have to rewrite some of the documentation anyways before they are PR ready. I'm not making a separate docs only fixup. So for the most part the style changes and docs changes are already going to have to be re-reviewed when those follow on changes land. Originally I intended not to update much documentation along the way (except where critical) because the mempool documentation is already very stale, but to honor some requests to improve it incrementally I added more stuff here. This points to why "just fixing up the docs" isn't really an option, they're so stale that any effort to improve them is inevitably going to bikeshed more because they're unclear and people want something easier to read. |
I wouldn't call that bikeshedding. Referring to |
Thanks for the merge -- if someone wants to clean up the docs that's fine. Yes, the docs were objectively wrong, I probably fucked up the autocomplete when changing it, and if someone wants to fix it before I get around to it that's dandy. Just noting that the prior name -- setMemPoolChildren -- was wrong (no such member existed) for like 5 years and no one fixed it (and even more wrong -- there was no field anywhere), so I sort of doubt anyone is actually reading the docs to learn how the code works anyways. It needs a decent amount of rebasing and extracting specific chunks but I'm picking out patches from #17268 and JeremyRubin#5. As I do that, a lot of the function signatures will change which is a good time to update the docs to be more coherent. |
For my own reference, the broken docs are git grep --extended-regexp '(setMemPool|mapLinks|CTxMemPool::(Parents|m_children))' |
296be8f Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin) 46d955d Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin) Pull request description: Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free. Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry. It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves. The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff). ### Simpler ownership model No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry. ### Memory Usage We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure. ### Performance If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case. The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster. ``` Before: # Benchmark, evals, iterations, total, min, max, median ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793 After: # Benchmark, evals, iterations, total, min, max, median ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596 ``` The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything. Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users. ``` # Benchmark, evals, iterations, total, min, max, median ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068 # Benchmark, evals, iterations, total, min, max, median ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306 ``` I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider. # Discussion ## Why maplinks in the first place? I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map. ## Is iterator_to weird? iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to > iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable: > > - Interoperability with preexisting APIs based on pointers or references. > - Publication of pointer-based interfaces (for instance, when designing a C-compatible library). > - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times. > - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available. In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined). Edit by laanwj: removed at sign from the description ACKs for top commit: jonatack: re-ACK 296be8f per `git range-diff ab338a1 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean. hebasto: re-ACK 296be8f, only rebased since my [previous](bitcoin#19478 (review)) review (verified with `git range-diff`). Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
…data structure member)
…data structure member)
…data structure member)
…data structure member)
…data structure member)
…data structure member)
…erasure Summary: PR description: > Remove CTxMempool::mapLinks data structure member > > Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free. > > Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry. > > It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves. > > The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff). > > Simpler ownership model > > No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry. > > Memory Usage > > We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure. > > Performance > > If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in so many places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case. > > The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster. > > The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything. > > Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users. This is a backport of [[bitcoin/bitcoin#19478 | core#19478]] Test Plan: `ninja all check-all` Performance improvement confirmed with `bitcoin-bench`: ``` $ diff before.log after.log | grep ComplexMemPool | ns/elem | elem/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- < | 199,336,736.00 | 5.02 | 0.4% | 2.21 | `ComplexMemPool` > | 181,013,454.00 | 5.52 | 0.4% | 1.99 | `ComplexMemPool` ``` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D10191
Fixed in #26281. |
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in so many places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
Discussion
Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description