-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Improve UpdateTransactionsFromBlock with Epochs #17925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great application of this traversal algorithm -- just a couple comments that I think we should address, otherwise code review ACK. Will test a bit.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
d392553
to
bd5a026
Compare
I don't know what's up with the appveyor build but I think this code is correct, it passes all tests for me locally. As a sanity check, I verified that this code does give a speedup in a simple scenario where it would be expected to improve things: create a transaction with 2000 outputs that is confirmed in a block; then spend each of those outputs in a separate transaction in the mempool; then disconnect the block and measure the time spent in UpdateTransactionsFromBlock (which must visit each of those 2000 children). I observed a roughly 12% improvement in runtime in this simple test. ACK bd5a026 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK bd5a026
Failure case we want to prevent is false positive of visited
, returning we already updated this entry when in fact we haven't done so. This could happen if epoch counter isn't increased before every update, assert in visited
create this strict ordering, though epoch API may be revisited in further commits if we don't safe enough (specially in case of recursive epoch uses)
Concept ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK bd5a026, modulo some nits and a typo.
if (setChildren.insert(childIter).second && !setAlreadyIncluded.count(childHash)) { | ||
UpdateChild(it, childIter, true); | ||
UpdateParent(childIter, it, true); | ||
// we cache the in-mempool children to avoid duplicate updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. The comment is still relevant insofar as it explains what the EpochGuard is doing, but specifically the word "cache" is perhaps a bit vague.
* determine if that transaction has not yet been visited during the current | ||
* traversal's epoch. | ||
* Algorithms using std::set can be replaced on a one by one basis. | ||
* Both techniques are not fundamentally incomaptible across the codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: incomaptible --> incompatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
Good to know it. Let @ajtowns and me join your discussion ;)
Good names cause less cognitive work for other contributors, no?
May I suggest |
Apologies, that was an offline discussion! first_seen or FirstSeen is no good for this use case. If you negate visited then you impose that callers of visited has to check the negation of the call or to nest the code a level deeper inside an if, which adds more cognitive overhead IMO. Most of the code that I've written uses an I recommend reading this issue on style nits and whatnot: #15465; I don't think it's unreasonable to try to figure out better tweaks to names or whatnot, but I think as long as the code is clear & we can convince ourselves it's correct in review, that's more or less sufficient for moving on. |
Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised.
|
ACK bd5a026 (code review) |
bd5a026 Make UpdateTransactionsFromBlock use Epochs (Jeremy Rubin) 2ccb7cc Add Epoch Guards to CTXMemPoolEntry and CTxMemPool (Jeremy Rubin) Pull request description: UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool. Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`. By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior. Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free. This PR is related to #17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to #17268) to improve the mempool ACKs for top commit: sdaftuar: ACK bd5a026 adamjonas: Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised. ajtowns: ACK bd5a026 (code review) ariard: Code review ACK bd5a026 hebasto: ACK bd5a026, modulo some nits and a typo. Tree-SHA512: f0d2291085019ffb4e1119edeb9f4a89c1a572d1cb5b4bdf5743dd0152e721e1935f5155dcae84e1e5bda5ffdf6224c488c1e200bd33bedca9f5ca22d5f5139f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK bd5a026
* visiting the same transactions twice. Some traversal algorithms use | ||
* std::set (or setEntries) to deduplicate the transaction we visit. | ||
* However, use of std::set is algorithmically undesirable because it both | ||
* adds an asymptotic factor of O(log n) to traverals cost and triggers O(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo for follow-ups: s/traverals/traversals/
bd5a026 Make UpdateTransactionsFromBlock use Epochs (Jeremy Rubin) 2ccb7cc Add Epoch Guards to CTXMemPoolEntry and CTxMemPool (Jeremy Rubin) Pull request description: UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool. Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`. By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior. Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free. This PR is related to bitcoin#17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to bitcoin#17268) to improve the mempool ACKs for top commit: sdaftuar: ACK bd5a026 adamjonas: Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised. ajtowns: ACK bd5a026 (code review) ariard: Code review ACK bd5a026 hebasto: ACK bd5a026, modulo some nits and a typo. Tree-SHA512: f0d2291085019ffb4e1119edeb9f4a89c1a572d1cb5b4bdf5743dd0152e721e1935f5155dcae84e1e5bda5ffdf6224c488c1e200bd33bedca9f5ca22d5f5139f
8098dea test: Add mempool_updatefromblock.py (Hennadii Stepanov) Pull request description: This PR adds a new test for mempool update of transaction descendants/ancestors information (count, size) when transactions have been re-added from a disconnected block to the mempool. It could be helpful for working on PRs like #17925, #18191. ACKs for top commit: ariard: ACK 8098dea Tree-SHA512: 7e808fa8df8d7d7a7dbdc3f79361049b49c7bce9b58fd5539b28c9636bedac747695537e500d7ed68dc8bdb80167ad3f1c01086f7551691405d2ba2e38ef1d06
8098dea test: Add mempool_updatefromblock.py (Hennadii Stepanov) Pull request description: This PR adds a new test for mempool update of transaction descendants/ancestors information (count, size) when transactions have been re-added from a disconnected block to the mempool. It could be helpful for working on PRs like bitcoin#17925, bitcoin#18191. ACKs for top commit: ariard: ACK 8098dea Tree-SHA512: 7e808fa8df8d7d7a7dbdc3f79361049b49c7bce9b58fd5539b28c9636bedac747695537e500d7ed68dc8bdb80167ad3f1c01086f7551691405d2ba2e38ef1d06
bd5a026 Make UpdateTransactionsFromBlock use Epochs (Jeremy Rubin) 2ccb7cc Add Epoch Guards to CTXMemPoolEntry and CTxMemPool (Jeremy Rubin) Pull request description: UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool. Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`. By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior. Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free. This PR is related to bitcoin#17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to bitcoin#17268) to improve the mempool ACKs for top commit: sdaftuar: ACK bd5a026 adamjonas: Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised. ajtowns: ACK bd5a026 (code review) ariard: Code review ACK bd5a026 hebasto: ACK bd5a026, modulo some nits and a typo. Tree-SHA512: f0d2291085019ffb4e1119edeb9f4a89c1a572d1cb5b4bdf5743dd0152e721e1935f5155dcae84e1e5bda5ffdf6224c488c1e200bd33bedca9f5ca22d5f5139f
…ctionsFromBlock use Epochs Summary: > UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool. > > Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`. > > By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior. > > Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free. This is a backport of Core [[bitcoin/bitcoin#17925 | PR17925]] Test Plan: `ninja && ninja check` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Subscribers: majcosta Differential Revision: https://reviews.bitcoinabc.org/D8733
fd6580e [refactor] txmempool: split epoch logic into class (Anthony Towns) Pull request description: Splits the epoch logic introduced in #17925 into a separate class. Uses clang's thread safety annotations and encapsulates the data more strongly to reduce chances of bugs from API misuse. ACKs for top commit: jonatack: ACK fd6580e using clang thread safety annotations looks like a very good idea, and the encapsulation this change adds should improve robustness (and possible unit test-ability) of the code. Verified that changing some of the locking duly provoked build-time warnings with Clang 9 on Debian and that small changes in the new `Epoch` class were covered by failing functional test assertions in `mempool_updatefromblock.py`, `mempool_resurrect.py`, and `mempool_reorg.py` hebasto: re-ACK fd6580e, since my [previous](#18017 (review)) review: Tree-SHA512: 7004623faa02b56639aa05ab7a078320a6d8d54ec62d8022876221e33f350f47df51ddff056c0de5be798f8eb39b5c03c2d3f035698555d70abc218e950f2f8c
merge bitcoin#17925, bitcoin#16805: auxilliary backports
UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool.
Because we construct a
setEntries setChildren
, which is backed by astd::set
, it is possible that this algorithm isO(N log N)
.By using an Epoch visitor pattern, we can limit this to
O(N)
worst case behavior.Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free.
This PR is related to #17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to #17268) to improve the mempool