Skip to content

Conversation

jnewbery
Copy link
Contributor

Currently we keep a per-peer list of orphans to reconsider after one of their parents is accepted. There's no reason this list should be connected to the peer that provided the parent, so instead make it a global. Process that list in a separate call into net processing that does general work rather than in the context of a specific peer.

@jnewbery jnewbery force-pushed the 2020-06-global-orphans branch from 70501cf to b957b0d Compare June 23, 2020 18:36
@DrahtBot DrahtBot added the P2P label Jun 23, 2020
@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 23, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@@ -73,6 +73,14 @@ class PeerLogicValidation final : public CValidationInterface, public NetEventsI
*/
bool SendMessages(CNode* pto) override EXCLUSIVE_LOCKS_REQUIRED(pto->cs_sendProcessing);

/**
* Process global tasks that aren't attached to a specific peer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should clarify that these tasks are expected to be limited to 1 second or something (perhaps "lightweight tasks")?
Just so that future devs don't tempt to put any heavy logic in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such documentation, if it's added, should be at the level of the PeerLogicValidation class. We don't want anything in the message handler thread to block for a long time (or to be blocked by slow tasks in other threads). That includes anything in ProcessMessages(), SendMessages() and (now) ProcessGlobalTasks().

@naumenkogs
Copy link
Member

Concept ACK.
The idea makes perfect sense from the high-level. The implementation seems safe.

@laanwj
Copy link
Member

laanwj commented Jun 24, 2020

Does this new global data structure need to be limited in size in some way? At least when it was per peer, it would go away when the peer was disconnected. I'm vaguely worried that this might open up a new DoS vector.

@jnewbery
Copy link
Contributor Author

Does this new global data structure need to be limited in size in some way? At least when it was per peer, it would go away when the peer was disconnected. I'm vaguely worried that this might open up a new DoS vector.

That's the right question to be asking, but I think we're safe:

  • items are inserted into g_orphan_work_set at the point they would have been inserted into the per-peer orphan_work_set.
  • one item is popped from that set on every message handler thread loop. In effect this means that the set is emptied very quickly. Even if the peer that provided the parent (which previously would have held the object in its orphan_work_set), the object will very quickly be processed from g_orphan_work_set.

These entries can't persist for long in the set. I could refactor ProcessOrphanTx() to make it more obvious that we'll always empty this set quickly, but I wanted to keep the diff small for this PR. Perhaps I could do that in a follow-up.

Longer term, it might make sense to consolidate all orphan logic into a module, similar to how #19184 overhauls request logic.

Copy link
Contributor

@troygiorshev troygiorshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b957b0d modulo below

Looks to me that this PR is a very natural extension of #15644. (It's maybe in the same line of thinking as this comment there) Naively this global approach feels better.

In regards to the addition of ProcessGlobalTasks, it seems reasonable to me but I don't think I have the background to honestly Concept ACK it. (i.e. I can't fully grasp the impact it will have on future development). Looks like others have, so no issue here.

If I can give it a shot, the suspected DoS vector is something along the lines of a peer feeding us fake orphan transactions, therefore inflating the size of g_orphan_work_set and preventing us from processing any other orphaned transactions? Seems like once that peer is dealt with, g_orphan_work_set will be cleared relatively quickly.

* @param[in] interrupt Interrupt condition for processing threads
* @returns bool true if there's more work to do
*/
bool ProcessGlobalTasks(std::atomic<bool>& interrupt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler warning -Winconsistent-missing-override

Suggested change
bool ProcessGlobalTasks(std::atomic<bool>& interrupt);
bool ProcessGlobalTasks(std::atomic<bool>& interrupt) override;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed!

jnewbery added 3 commits June 29, 2020 12:51
ProcessGlobalTasks does general work that isn't attached to a specific
peer.
Orphan reprocessing should not be tied to the peer that provided
the parent. Move orphan_work_set to be a global and process it
in ProcessGlobalTasks()
We reprocess orphan transactions on every MessageHandler thread,
so there's no need to call it in the context of an individual peer.
@jnewbery
Copy link
Contributor Author

If I can give it a shot, the suspected DoS vector is something along the lines of a peer feeding us fake orphan transactions, therefore inflating the size of g_orphan_work_set and preventing us from processing any other orphaned transactions?

I think @laanwj's concern may have been in general about us storing unvalidated data from a peer, which is always something that we need to be careful about. It's not so much of a concern here because the size of orphan_work_set is limited by the size of the orphan set, and we'll always clear orphan_work_set very quickly.

I have a branch here: https://github.com/jnewbery/bitcoin/tree/2020-06-global-orphans2 that refactors ProcessOrphanTx() and makes it very obvious that g_orphan_work_set is emptied very quickly. I'm not sure whether it's worth opening that as a follow-up after this is merged.

@jnewbery jnewbery force-pushed the 2020-06-global-orphans branch from b957b0d to 9487f08 Compare June 29, 2020 16:58
@hebasto
Copy link
Member

hebasto commented Jun 29, 2020

Concept ACK.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea! This LGTM.

I wonder if there may have been a reason why it was implemented this way in #15644 / 866c805, so it may be good to have review by the author of the code.

To keep the commits hygienic, consider removing the call to ProcessOrphanTx from ProcessMessage in the same commit that adds the new call to it from ProcessGlobalTasks, e.g. squashing the last two commits.

If you retouch, might be good to fixup s/ProcessMessages/ProcessMessage/ in the commit message, as both function names exist in the file.

Consider adding here the Doxygen comments commit from your branch.

If this moves forward, big concept ACK on the changes in https://github.com/jnewbery/bitcoin/commits/2020-06-global-orphans2.

@@ -2832,7 +2833,7 @@ void ProcessMessage(
auto it_by_prev = mapOrphanTransactionsByPrev.find(COutPoint(inv.hash, i));
if (it_by_prev != mapOrphanTransactionsByPrev.end()) {
for (const auto& elem : it_by_prev->second) {
pfrom.orphan_work_set.insert(elem->first);
g_orphan_work_set.insert(elem->first);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9200126 nit: would it be better to use emplace here to construct rather than copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emplace wins when you have to construct a temporary in place which is then copied/moved to the container. When you already have the object in hand (as we do here), it makes no difference - the object is just copied into the container.

@jnewbery
Copy link
Contributor Author

jnewbery commented Jul 1, 2020

I wonder if there may have been a reason why it was implemented this way in #15644 / 866c805, so it may be good to have review by the author of the code.

See #15644 (comment)

To keep the commits hygienic, consider removing the call to ProcessOrphanTx from ProcessMessage in the same commit that adds the new call to it from ProcessGlobalTasks, e.g. squashing the last two commits.

I think that'd be doing too much in the same commit, but I'm happy to squash if other reviewers agree.

If you retouch, might be good to fixup s/ProcessMessages/ProcessMessage/ in the commit message, as both function names exist in the file.

I did intend to put ProcessMessages. ProcessMessage is called from ProcessMessages, and ProcessMessages is the entry point into net processing for the message handler thread, so from a high level, it makes sense to say that his change is moving orphan reconsideration from ProcessMessages into ProcessGlobalTasks.

Consider adding here the Doxygen comments commit from your branch.

I think that can stay in the follow-up PR, but again, I'm happy to add it here if other reviewers think it makes more sense to add it here.

@jonatack
Copy link
Member

jonatack commented Jul 1, 2020

Thanks @jnewbery.

I wanted to encourage the Doxygen commit if you weren't sure about doing a follow-up.

ACK 9487f08

@troygiorshev
Copy link
Contributor

reACK 9487f08

Looking forward to the follow-up!

@sipa
Copy link
Member

sipa commented Jul 2, 2020

It took me a while to remember the context here, but this PR causes a subtle but significant change in P2P behavior.

The reason that there is a per-peer set of to-be-processed potential orphans is that this lets us pause processing for that peer until they're all processed - as long as the orphan work set for a peer is non-empty, all that happens for that peer is steps towards emptying that set. This makes the behavior observationally equivalent (from the perspective of any single peer) to recursively processing all orphans immediately as soon as the "missing link" transaction is received.

With this change that is no longer the case - P2P messages from the peer that provides the missing link can and will be processed and responded to before processing the dependent orphans. I'm not sure that anything depends on that behavior, but I also can't guarantee that there isn't.

I don't think code cleanup is a sufficient justification for such a fundamental P2P change.

@jnewbery
Copy link
Contributor Author

jnewbery commented Jul 2, 2020

I'm not sure that anything depends on that behavior, but I also can't guarantee that there isn't.

I don't think code cleanup is a sufficient justification for such a fundamental P2P change.

For context, this is part of a wider project to move all application-layer data and processing into net processing.

The PR that introduced orphan_work_set and moved orphan data into the net layer also changed behaviour, as you noted in the PR description: Messages from other peers arriving in the mean time are processed normally, but other messages from the peer that gave us the parent have to wait until all orphan processing is done. ie processing the orphan_work_set is interposed with processing messages from other peers. How were you able to satisfy yourself that that change was safe, but this one isn't? Is there a specific scenario you're thinking of here which you're worried about?

I can't think of any problems this might cause. If a peer announces a tx that conflicts with an orphan and also provides the parent, then in master, the orphan will be accepted into the mempool, but in this branch the conflicting tx may be accepted first. I don't think that's an issue since in the case of two conflicting txs we never guarantee which one will make it into the mempool. Are there other scenarios we should be considering?

Generally, if we're worried about behaviour changes we try to reason about them and satisfy ourselves that they're safe, and include tests if we have any doubts. I don't see why this PR should be different.

@jnewbery
Copy link
Contributor Author

jnewbery commented Jul 2, 2020

Lots of discussion about this on IRC today: http://www.erisian.com.au/bitcoin-core-dev/log-2020-07-02.html L325 onwards.

@sipa
Copy link
Member

sipa commented Jul 7, 2020

FWIW here is a commit that moves orphan_work_set to net_processing (inside NodeState) without changing semantics: https://github.com/sipa/bitcoin/commits/202007_orphan_work_set_to_np

It also makes orphan_work_set protected by g_cs_orphans, as it apparently was not explicitly protected by any lock in the existing code (which was ok as only the message handler thread accessed it, but still scary).

@jnewbery
Copy link
Contributor Author

jnewbery commented Jul 7, 2020

This PR intended to do two things:

  1. Don't do orphan reconsideration in the context of an individual peer
  2. Move orphan_work_set data from net into net_processing.

I think both changes are good. If everyone agreed, I don't think it'd be a problem to do both in this PR, but since this is controversial and @sipa has concerns that (1) is correct, it makes sense to split them out and consider them separately.

The branch at https://github.com/sipa/bitcoin/commits/202007_orphan_work_set_to_np seems fine, but I hope we'll soon have a PeerState struct in net_processing that isn't guarded by cs_main. That'd be a more natural place for this. See https://github.com/jnewbery/bitcoin/tree/2020-06-cs-main-split for a demonstration. No harm in moving it to CNodeState first, but it might be duplicate work if we're going to move it again shortly afterwards.

@jnewbery jnewbery closed this Jul 7, 2020
maflcko pushed a commit that referenced this pull request Sep 30, 2020
001343f ProcessOrphanTx: Move AddToCompactExtraTransactions call into ProcessOrphanTx (John Newbery)
4fce726 ProcessOrphanTx: Remove aliases (John Newbery)
e07c5d9 ProcessOrphanTx: Remove outdated commented (John Newbery)
4763b51 ProcessOrphanTx: remove useless setMisbehaving set (John Newbery)
55c79a9 ProcessOrphanTx: remove useless done variable (John Newbery)
6e8dd99 [net processing] Add doxygen comments for orphan data and function (John Newbery)

Pull request description:

  Originally a follow-up to #19364, this simplifies the logic in ProcessOrphanTx() and removes unused variables.

ACKs for top commit:
  troygiorshev:
    ACK 001343f
  sipa:
    utACK 001343f
  MarcoFalke:
    ACK 001343f 🌮

Tree-SHA512: be558457f2e08ebb6bddcd49bdd75bd410c3650da44a76c688fc9f07822f94d5a1af93fa1342678052b2c8163cdb9745c352c7884325ab0a41fa593c3eb89116
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 30, 2020
001343f ProcessOrphanTx: Move AddToCompactExtraTransactions call into ProcessOrphanTx (John Newbery)
4fce726 ProcessOrphanTx: Remove aliases (John Newbery)
e07c5d9 ProcessOrphanTx: Remove outdated commented (John Newbery)
4763b51 ProcessOrphanTx: remove useless setMisbehaving set (John Newbery)
55c79a9 ProcessOrphanTx: remove useless done variable (John Newbery)
6e8dd99 [net processing] Add doxygen comments for orphan data and function (John Newbery)

Pull request description:

  Originally a follow-up to bitcoin#19364, this simplifies the logic in ProcessOrphanTx() and removes unused variables.

ACKs for top commit:
  troygiorshev:
    ACK 001343f
  sipa:
    utACK 001343f
  MarcoFalke:
    ACK 001343f 🌮

Tree-SHA512: be558457f2e08ebb6bddcd49bdd75bd410c3650da44a76c688fc9f07822f94d5a1af93fa1342678052b2c8163cdb9745c352c7884325ab0a41fa593c3eb89116
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants