Skip to content

Conversation

JeremyRubin
Copy link
Contributor

UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool.

Because we construct a setEntries setChildren, which is backed by a std::set, it is possible that this algorithm is O(N log N).

By using an Epoch visitor pattern, we can limit this to O(N) worst case behavior.

Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free.

This PR is related to #17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to #17268) to improve the mempool

Copy link
Member

@sdaftuar sdaftuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great application of this traversal algorithm -- just a couple comments that I think we should address, otherwise code review ACK. Will test a bit.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 14, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #17786 (refactor: Nuke policy/fees->mempool circular dependencies by hebasto)
  • #16910 (wallet: reduce loading time by using unordered maps by achow101)
  • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@JeremyRubin JeremyRubin force-pushed the epoch-mempool-clean-split branch from d392553 to bd5a026 Compare January 15, 2020 03:30
@JeremyRubin JeremyRubin mentioned this pull request Jan 15, 2020
@sdaftuar
Copy link
Member

I don't know what's up with the appveyor build but I think this code is correct, it passes all tests for me locally.

As a sanity check, I verified that this code does give a speedup in a simple scenario where it would be expected to improve things: create a transaction with 2000 outputs that is confirmed in a block; then spend each of those outputs in a separate transaction in the mempool; then disconnect the block and measure the time spent in UpdateTransactionsFromBlock (which must visit each of those 2000 children). I observed a roughly 12% improvement in runtime in this simple test.

ACK bd5a026

Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK bd5a026

Failure case we want to prevent is false positive of visited, returning we already updated this entry when in fact we haven't done so. This could happen if epoch counter isn't increased before every update, assert in visited create this strict ordering, though epoch API may be revisited in further commits if we don't safe enough (specially in case of recursive epoch uses)

@hebasto
Copy link
Member

hebasto commented Jan 28, 2020

Concept ACK.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK bd5a026, modulo some nits and a typo.

if (setChildren.insert(childIter).second && !setAlreadyIncluded.count(childHash)) {
UpdateChild(it, childIter, true);
UpdateParent(childIter, it, true);
// we cache the in-mempool children to avoid duplicate updates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The comment is still relevant insofar as it explains what the EpochGuard is doing, but specifically the word "cache" is perhaps a bit vague.

* determine if that transaction has not yet been visited during the current
* traversal's epoch.
* Algorithms using std::set can be replaced on a one by one basis.
* Both techniques are not fundamentally incomaptible across the codebase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: incomaptible --> incompatible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@hebasto
Copy link
Member

hebasto commented Jan 29, 2020

@JeremyRubin

@sdaftuar and I spent like at least an hour trying to come up with a better semantic name than the predecessor to visited, and decided that visited was sufficiently semantic.

Good to know it. Let @ajtowns and me join your discussion ;)

I don't care about the names that much though, other than it causes negligible more rebase work for me -- @ajtowns will refactor some of this in a follow up with a new interface, I think if we want to bikeshed names more we can update it then.

Good names cause less cognitive work for other contributors, no?

@ajtowns

needs_processing(it2) or similar could work, but visited seems pretty fine to me. (I would prefer it if it more clearly indicated that it's not idempotent/const; ie bool a = visited(it); bool b = visited(it); could result in !a && b, so maybe first_visit() or similar would be better? But still, it's already fine)

May I suggest first_seen() (or FirstSeen() to be in line with our naming convention) with first_seen() == !visited() ?

@JeremyRubin
Copy link
Contributor Author

Apologies, that was an offline discussion!

first_seen or FirstSeen is no good for this use case. If you negate visited then you impose that callers of visited has to check the negation of the call or to nest the code a level deeper inside an if, which adds more cognitive overhead IMO. Most of the code that I've written uses an if (visited(it)) continue; guard to check the next element, this code becomes less readable with if (!first_seen(x)) continue; or if (first_seen(it)) { x }`

I recommend reading this issue on style nits and whatnot: #15465; I don't think it's unreasonable to try to figure out better tweaks to names or whatnot, but I think as long as the code is clear & we can convince ourselves it's correct in review, that's more or less sufficient for moving on.

@adamjonas
Copy link
Member

Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised.

  • In review, there have been some style nits and naming convention disagreements along with a suggested follow on PR from @ajtowns which uses clang's thread safety annotations and encapsulates the data more strongly to reduce chances of bugs from API misuse.

  • @sdaftuar has verified a speed-up for the case this PR was designed for.

@ajtowns
Copy link
Contributor

ajtowns commented Jan 30, 2020

ACK bd5a026 (code review)

laanwj added a commit that referenced this pull request Feb 3, 2020
bd5a026 Make UpdateTransactionsFromBlock use Epochs (Jeremy Rubin)
2ccb7cc Add Epoch Guards to CTXMemPoolEntry and CTxMemPool (Jeremy Rubin)

Pull request description:

  UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool.

  Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`.

  By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior.

  Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free.

  This PR is related to #17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to #17268) to improve the mempool

ACKs for top commit:
  sdaftuar:
    ACK bd5a026
  adamjonas:
    Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised.
  ajtowns:
    ACK bd5a026 (code review)
  ariard:
    Code review ACK bd5a026
  hebasto:
    ACK bd5a026, modulo some nits and a typo.

Tree-SHA512: f0d2291085019ffb4e1119edeb9f4a89c1a572d1cb5b4bdf5743dd0152e721e1935f5155dcae84e1e5bda5ffdf6224c488c1e200bd33bedca9f5ca22d5f5139f
@laanwj laanwj merged commit bd5a026 into bitcoin:master Feb 3, 2020
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK bd5a026

* visiting the same transactions twice. Some traversal algorithms use
* std::set (or setEntries) to deduplicate the transaction we visit.
* However, use of std::set is algorithmically undesirable because it both
* adds an asymptotic factor of O(log n) to traverals cost and triggers O(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo for follow-ups: s/traverals/traversals/

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 3, 2020
bd5a026 Make UpdateTransactionsFromBlock use Epochs (Jeremy Rubin)
2ccb7cc Add Epoch Guards to CTXMemPoolEntry and CTxMemPool (Jeremy Rubin)

Pull request description:

  UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool.

  Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`.

  By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior.

  Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free.

  This PR is related to bitcoin#17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to bitcoin#17268) to improve the mempool

ACKs for top commit:
  sdaftuar:
    ACK bd5a026
  adamjonas:
    Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised.
  ajtowns:
    ACK bd5a026 (code review)
  ariard:
    Code review ACK bd5a026
  hebasto:
    ACK bd5a026, modulo some nits and a typo.

Tree-SHA512: f0d2291085019ffb4e1119edeb9f4a89c1a572d1cb5b4bdf5743dd0152e721e1935f5155dcae84e1e5bda5ffdf6224c488c1e200bd33bedca9f5ca22d5f5139f
maflcko pushed a commit that referenced this pull request Apr 29, 2020
8098dea test: Add mempool_updatefromblock.py (Hennadii Stepanov)

Pull request description:

  This PR adds a new test for mempool update of transaction descendants/ancestors information (count, size) when transactions have been re-added from a disconnected block to the mempool.

  It could be helpful for working on PRs like #17925, #18191.

ACKs for top commit:
  ariard:
    ACK 8098dea

Tree-SHA512: 7e808fa8df8d7d7a7dbdc3f79361049b49c7bce9b58fd5539b28c9636bedac747695537e500d7ed68dc8bdb80167ad3f1c01086f7551691405d2ba2e38ef1d06
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 29, 2020
8098dea test: Add mempool_updatefromblock.py (Hennadii Stepanov)

Pull request description:

  This PR adds a new test for mempool update of transaction descendants/ancestors information (count, size) when transactions have been re-added from a disconnected block to the mempool.

  It could be helpful for working on PRs like bitcoin#17925, bitcoin#18191.

ACKs for top commit:
  ariard:
    ACK 8098dea

Tree-SHA512: 7e808fa8df8d7d7a7dbdc3f79361049b49c7bce9b58fd5539b28c9636bedac747695537e500d7ed68dc8bdb80167ad3f1c01086f7551691405d2ba2e38ef1d06
sidhujag pushed a commit to syscoin-core/syscoin that referenced this pull request Nov 10, 2020
bd5a026 Make UpdateTransactionsFromBlock use Epochs (Jeremy Rubin)
2ccb7cc Add Epoch Guards to CTXMemPoolEntry and CTxMemPool (Jeremy Rubin)

Pull request description:

  UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool.

  Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`.

  By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior.

  Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free.

  This PR is related to bitcoin#17268, it is a small subset of the changes which have been refactored slightly to ease review. If this PR gets review & merge, I will follow up with more PRs (similar to bitcoin#17268) to improve the mempool

ACKs for top commit:
  sdaftuar:
    ACK bd5a026
  adamjonas:
    Just to summarize for those looking to review - as of bd5a026 there are 3 ACKs (@sdaftuar, @ariard, and @hebasto) and one "looks good" from @ajtowns with no NACKs or any show-stopping concerns raised.
  ajtowns:
    ACK bd5a026 (code review)
  ariard:
    Code review ACK bd5a026
  hebasto:
    ACK bd5a026, modulo some nits and a typo.

Tree-SHA512: f0d2291085019ffb4e1119edeb9f4a89c1a572d1cb5b4bdf5743dd0152e721e1935f5155dcae84e1e5bda5ffdf6224c488c1e200bd33bedca9f5ca22d5f5139f
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Dec 22, 2020
…ctionsFromBlock use Epochs

Summary:
> UpdateTransactionsFromBlock is called during a re-org. When a re-org occurs, all of the transactions in the mempool may be descendants from a transaction which is in the pre-reorg block. This can cause us to propagate updates, worst case, to every transaction in the mempool.
>
> Because we construct a `setEntries setChildren`, which is backed by a `std::set`, it is possible that this algorithm is `O(N log N)`.
>
> By using an Epoch visitor pattern, we can limit this to `O(N)` worst case behavior.
>
> Epochs are also less resource intensive than almost any set option (e.g., hash set) because they are allocation free.

This is a backport of Core [[bitcoin/bitcoin#17925 | PR17925]]

Test Plan: `ninja && ninja check`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Subscribers: majcosta

Differential Revision: https://reviews.bitcoinabc.org/D8733
laanwj added a commit that referenced this pull request Feb 24, 2021
fd6580e [refactor] txmempool: split epoch logic into class (Anthony Towns)

Pull request description:

  Splits the epoch logic introduced in #17925 into a separate class.

  Uses clang's thread safety annotations and encapsulates the data more strongly to reduce chances of bugs from API misuse.

ACKs for top commit:
  jonatack:
    ACK fd6580e using clang thread safety annotations looks like a very good idea, and the encapsulation this change adds should improve robustness (and possible unit test-ability) of the code. Verified that changing some of the locking duly provoked build-time warnings with Clang 9 on Debian and that small changes in the new `Epoch` class were covered by failing functional test assertions in `mempool_updatefromblock.py`, `mempool_resurrect.py`, and `mempool_reorg.py`
  hebasto:
    re-ACK fd6580e, since my [previous](#18017 (review)) review:

Tree-SHA512: 7004623faa02b56639aa05ab7a078320a6d8d54ec62d8022876221e33f350f47df51ddff056c0de5be798f8eb39b5c03c2d3f035698555d70abc218e950f2f8c
kwvg added a commit to kwvg/dash that referenced this pull request Jun 16, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 16, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jun 16, 2021
PastaPastaPasta added a commit to dashpay/dash that referenced this pull request Jun 21, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants