Skip to content

Conversation

ajtowns
Copy link
Contributor

@ajtowns ajtowns commented Oct 11, 2022

Moves extra transactions to be under the m_msgproc_mutex lock rather than g_cs_orphans and refactors orphan handling so that the lock can be internal to the TxOrphange class.

@ajtowns
Copy link
Contributor Author

ajtowns commented Oct 11, 2022

Resurrects #21527

@dergoegge
Copy link
Member

Concept ACK

-BEGIN VERIFY SCRIPT-
sed -i -e 's/static RecursiveMutex/mutable Mutex/' src/txorphanage.h
sed -i -e '/RecursiveMutex/d' src/txorphanage.cpp
sed -i -e 's/g_cs_orphans/m_mutex/g' $(git grep -l g_cs_orphans src/)
-END VERIFY SCRIPT-
@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 11, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK dergoegge, glozow
Concept ACK hebasto, chinggg

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26151 (refactor: Guard TxRequestTracker by its own lock instead of cs_main by dergoegge)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@hebasto
Copy link
Member

hebasto commented Oct 12, 2022

Concept ACK.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

89e2e0d

... protect them by g_msgproc_mutex instead, as they
are only used during message processing.

It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex. It could be locked only within PeerManagerImpl::AddToCompactExtraTransactions and PeerManagerImpl::ProcessMessage functions. The fact that "they
are only used during message processing" will guarantee no performance loss.

6f8e442 "net_processing: Localise orphan_work_set handling to ProcessOrphanTx" looks behavior changing. Do we have tests which cover it?

Tbh, I'd be happy if 4 or 5 first commits could be split into a separated PR.

@@ -924,14 +930,14 @@ class PeerManagerImpl final : public PeerManager
/** Storage for orphan information */
TxOrphanage m_orphanage;

void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

89e2e0d

Add AssertLockHeld(g_msgproc_mutex); into implementation code?

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big Concept ACK

@ajtowns
Copy link
Contributor Author

ajtowns commented Oct 17, 2022

It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex

No, it shouldn't have its own mutex: it's always better to have fewer locks than more, as that uses fewer resources and means there are fewer things to keep in your head when following the logic. The only time when more locks are better is when that enables more parallelism, and improves performance -- that is, a justification is needed when introducing new locks.

@hebasto
Copy link
Member

hebasto commented Oct 17, 2022

It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex

No, it shouldn't have its own mutex... The only time when more locks are better is when that enables more parallelism, and improves performance -- that is, a justification is needed when introducing new locks.

This implies that in future design changes the vExtraTxnForCompact cannot be processed by any other thread except for the message processing one. Do we really want such a restriction for an ideal networking code?

it's always better to have fewer locks than more, as that uses fewer resources and means there are fewer things to keep in your head when following the logic.

I cannot agree with "it's always better".

Making it use "fewer resources" is a kind of optimization, and "premature optimization is the root of all evil”.

it's always better to have fewer locks than more, as ... there are fewer things to keep in your head when following the logic.

Perhaps, we are talking about personal biases, but, when following the logic, it's better to keep things structured, to have small classes which implement data structures with internal locking and thread-safe interfaces.

@ajtowns
Copy link
Contributor Author

ajtowns commented Oct 18, 2022

This implies that in future design changes the vExtraTxnForCompact cannot be processed by any other thread except for the message processing one.

What are you talking about? If we can put extra transactions under a different lock now, we certainly can in the future. This isn't an API that we have to support indefinitely.

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 7082ce3

All comments are non-blocking (could also be done in a follow up)

*/
std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
std::optional<TxToReconsider> GetTxToReconsider(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);

With TxToReconsider being:

struct TxToReconsider {
    const CTransactionRef tx;
    const NodeId originator;
    const bool more;
};

That would seem a little cleaner to me because originator and more only have meaning if the returned transaction is not nullptr, so bundling the three would make sense imo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to comment the same thing, this seems like a better interface given that we only want originator/more if there is a tx to work on, and generally avoiding in-out params is cleaner. (non-blocking)

Copy link
Contributor Author

@ajtowns ajtowns Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some other followup ideas in the previous PR, particularly #21527 (comment) -- the idea there is that if Alice sends orphan tx Y which has a missing parent X, instead of immediately processing Y when X arrives (which might have arrived from Bob), just put it in Alice's workset, and delay processing of the orphan until it's Alice's turn. Then you'd have peer == originator and not need it to be returned as a result. At least last time I looked at this (the 202104-whohandlesorphans branch), it ended up simplifying the function to just CTransactionRef GetTxToReconsider(NodeId peer) (using nullptr as the "there's nothing to consider" indicator).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened this as a draft at #26551

AssertLockHeld(g_cs_orphans);
LOCK(m_mutex);

// Get this peer's work set, emplacing an empty set it didn't exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Get this peer's work set, emplacing an empty set it didn't exist
// Get this peer's work set, emplacing an empty set if it didn't exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated in #26551

* orphan will be reconsidered on each call of this function. This set
* may be added to if accepting an orphan causes its children to be
* reconsidered.
*/
void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set)
void PeerManagerImpl::ProcessOrphanTx(Peer& peer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the work set is moved to the orphanage, ProcessOrphanTx really only needs the id of the peer not the entire Peer.

Maybe pass NodeId to ProcessOrphanTx instead of Peer in 9910ed7 and have it look up the needed Peer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you already have a Peer, I don't think there's really any reason not to pass in.

@chinggg
Copy link
Contributor

chinggg commented Oct 30, 2022

Concept ACK

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 7082ce3 via code review and some basic testing. I think putting txorphanage in charge of handling peer work sets is the right direction.

* Reconsider orphan transactions after a parent has been accepted to the mempool.
*
* @peer[in] peer The peer whose orphan transactions we will reconsider. Generally only one
* orphan will be reconsidered on each call of this function. This set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit in 0027174:

It's a bit confusing what "This set" refers to.

Suggested change
* orphan will be reconsidered on each call of this function. This set
* orphan will be reconsidered on each call of this function. The peer's orphan work set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated in #26551

@@ -2885,7 +2885,6 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
{
AssertLockHeld(g_msgproc_mutex);
AssertLockHeld(cs_main);
AssertLockHeld(g_cs_orphans);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 733d85f Note to self, my understanding here is:

  • The two threads that may access orphanage are (1) msg proc and (2) scheduler for calling m_orphanage.EraseForBlock() in BlockConnected().
  • Previously, we held g_cs_orphans through the entirety of ProcessOrphanTx, and now we are grabbing the lock for each call here (AddChildrenToWorkSet(*porphanTx, peer.m_id), EraseTx(orphanHash)).

And this is fine. The "problem" would be if we called BlockConnected and e.g. evicted an orphan that was just confirmed. It's fine since EraseTx() handles if a tx is already gone, and AddChildrenToworkSet() will wait on g_cs_orphans for EraseForBlock() to be done updating m_outpoint_to_orphan_it.

*/
std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to comment the same thing, this seems like a better interface given that we only want originator/more if there is a tx to work on, and generally avoiding in-out params is cleaner. (non-blocking)


while (CTransactionRef porphanTx = m_orphanage.GetTxToReconsider(peer.m_id, from_peer, more)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're touching this, I think it's quite strange that we do this in a while(!work_set.empty()) loop when we're trying to make sure that we only ever process 1 orphan at a time. Perhaps a remnant of pre-#15644 code? My suggestion is to take this out of the while loop and replace the breaks with returns... but non-blocking of course, can do in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's three "loops" I guess:

  • ThreadMessageHandler repeatedly calls ProcessMessages calls ProcessMessage calls ProcessOrphanTx; if ProcessOrphanTx returns true, ProcessMessage exits early so the next run of the ThreadMessageHandler loop for this peer immediately goes back to ProcessOrphanTx
  • while (GetTxToReconsider) here loops until it actually does some work (ProcessTransaction returns VALID or something other than TX_MISSING_INPUTS, returning true if the while loop could have continued
  • GetTxToReconsider loops over the peer's workset, returning the first entry in the workset that still exists as an orphan

Removing the while (GetTxToReconsider) loop would mean that we'd immediately switch to the next peer anytime we came across an orphan that needed two parents, rather than just one, despite that not taking much time to determine (it's part of MemPoolAccept::PreChecks). I don't think that would be wrong, but the way it is seems better to me.

Removing the internal loop for GetTxToReconsider would mean you'd need to return a new result: "there's more work to do, but the first bit of work was a no-op", and you'd be re-querying m_peer_work_set.find(peer) repeatedly.

@glozow glozow merged commit a79b720 into bitcoin:master Nov 28, 2022
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 1, 2022
glozow added a commit that referenced this pull request Jan 26, 2023
c58c249 net_processing: indicate more work to do when orphans are ready to reconsider (Anthony Towns)
ecb0a3e net_processing: Don't process tx after processing orphans (Anthony Towns)
c583775 net_processing: only process orphans before messages (Anthony Towns)
be23046 txorphange: Drop redundant originator arg from GetTxToReconsider (Anthony Towns)
a4fe099 txorphanage: index workset by originating peer (Anthony Towns)

Pull request description:

  We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to `ProcessMessage`.

  Based on #26295

ACKs for top commit:
  naumenkogs:
    utACK c58c249
  glozow:
    ACK c58c249
  mzumsande:
    Code Review ACK c58c249

Tree-SHA512: 3186c346f21e60440266a2a80a9d23d7b96071414e14b2b3bfe50457c04c18b1eab109c3d8c2a7726a6b10a2eda1f0512510a52c102da112820a26f5d96f12de
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 26, 2023
c58c249 net_processing: indicate more work to do when orphans are ready to reconsider (Anthony Towns)
ecb0a3e net_processing: Don't process tx after processing orphans (Anthony Towns)
c583775 net_processing: only process orphans before messages (Anthony Towns)
be23046 txorphange: Drop redundant originator arg from GetTxToReconsider (Anthony Towns)
a4fe099 txorphanage: index workset by originating peer (Anthony Towns)

Pull request description:

  We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to `ProcessMessage`.

  Based on bitcoin#26295

ACKs for top commit:
  naumenkogs:
    utACK c58c249
  glozow:
    ACK c58c249
  mzumsande:
    Code Review ACK c58c249

Tree-SHA512: 3186c346f21e60440266a2a80a9d23d7b96071414e14b2b3bfe50457c04c18b1eab109c3d8c2a7726a6b10a2eda1f0512510a52c102da112820a26f5d96f12de
@bitcoin bitcoin locked and limited conversation to collaborators Nov 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants