-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Replace global g_cs_orphans lock with local #26295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously vExtraTxnForCompact and vExtraTxnForCompactIt were protected by g_cs_orphans; protect them by g_msgproc_mutex instead, as they are only used during message processing.
Resurrects #21527 |
Concept ACK |
-BEGIN VERIFY SCRIPT- sed -i -e 's/static RecursiveMutex/mutable Mutex/' src/txorphanage.h sed -i -e '/RecursiveMutex/d' src/txorphanage.cpp sed -i -e 's/g_cs_orphans/m_mutex/g' $(git grep -l g_cs_orphans src/) -END VERIFY SCRIPT-
5d59755
to
7082ce3
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... protect them by
g_msgproc_mutex
instead, as they
are only used during message processing.
It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex. It could be locked only within PeerManagerImpl::AddToCompactExtraTransactions
and PeerManagerImpl::ProcessMessage
functions. The fact that "they
are only used during message processing" will guarantee no performance loss.
6f8e442 "net_processing: Localise orphan_work_set handling to ProcessOrphanTx" looks behavior changing. Do we have tests which cover it?
Tbh, I'd be happy if 4 or 5 first commits could be split into a separated PR.
@@ -924,14 +930,14 @@ class PeerManagerImpl final : public PeerManager | |||
/** Storage for orphan information */ | |||
TxOrphanage m_orphanage; | |||
|
|||
void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); | |||
void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add AssertLockHeld(g_msgproc_mutex);
into implementation code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big Concept ACK
No, it shouldn't have its own mutex: it's always better to have fewer locks than more, as that uses fewer resources and means there are fewer things to keep in your head when following the logic. The only time when more locks are better is when that enables more parallelism, and improves performance -- that is, a justification is needed when introducing new locks. |
This implies that in future design changes the
I cannot agree with "it's always better". Making it use "fewer resources" is a kind of optimization, and "premature optimization is the root of all evil”.
Perhaps, we are talking about personal biases, but, when following the logic, it's better to keep things structured, to have small classes which implement data structures with internal locking and thread-safe interfaces. |
What are you talking about? If we can put extra transactions under a different lock now, we certainly can in the future. This isn't an API that we have to support indefinitely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 7082ce3
All comments are non-blocking (could also be done in a follow up)
*/ | ||
std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); | ||
CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex); | |
std::optional<TxToReconsider> GetTxToReconsider(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex); |
With TxToReconsider
being:
struct TxToReconsider {
const CTransactionRef tx;
const NodeId originator;
const bool more;
};
That would seem a little cleaner to me because originator
and more
only have meaning if the returned transaction is not nullptr
, so bundling the three would make sense imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to comment the same thing, this seems like a better interface given that we only want originator
/more
if there is a tx to work on, and generally avoiding in-out params is cleaner. (non-blocking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some other followup ideas in the previous PR, particularly #21527 (comment) -- the idea there is that if Alice sends orphan tx Y which has a missing parent X, instead of immediately processing Y when X arrives (which might have arrived from Bob), just put it in Alice's workset, and delay processing of the orphan until it's Alice's turn. Then you'd have peer == originator
and not need it to be returned as a result. At least last time I looked at this (the 202104-whohandlesorphans branch), it ended up simplifying the function to just CTransactionRef GetTxToReconsider(NodeId peer)
(using nullptr
as the "there's nothing to consider" indicator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened this as a draft at #26551
AssertLockHeld(g_cs_orphans); | ||
LOCK(m_mutex); | ||
|
||
// Get this peer's work set, emplacing an empty set it didn't exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Get this peer's work set, emplacing an empty set it didn't exist | |
// Get this peer's work set, emplacing an empty set if it didn't exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated in #26551
* orphan will be reconsidered on each call of this function. This set | ||
* may be added to if accepting an orphan causes its children to be | ||
* reconsidered. | ||
*/ | ||
void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set) | ||
void PeerManagerImpl::ProcessOrphanTx(Peer& peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the work set is moved to the orphanage, ProcessOrphanTx
really only needs the id of the peer not the entire Peer
.
Maybe pass NodeId
to ProcessOrphanTx
instead of Peer
in 9910ed7 and have it look up the needed Peer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you already have a Peer
, I don't think there's really any reason not to pass in.
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 7082ce3 via code review and some basic testing. I think putting txorphanage in charge of handling peer work sets is the right direction.
* Reconsider orphan transactions after a parent has been accepted to the mempool. | ||
* | ||
* @peer[in] peer The peer whose orphan transactions we will reconsider. Generally only one | ||
* orphan will be reconsidered on each call of this function. This set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit in 0027174:
It's a bit confusing what "This set" refers to.
* orphan will be reconsidered on each call of this function. This set | |
* orphan will be reconsidered on each call of this function. The peer's orphan work set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated in #26551
@@ -2885,7 +2885,6 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer) | |||
{ | |||
AssertLockHeld(g_msgproc_mutex); | |||
AssertLockHeld(cs_main); | |||
AssertLockHeld(g_cs_orphans); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in 733d85f Note to self, my understanding here is:
- The two threads that may access orphanage are (1) msg proc and (2) scheduler for calling
m_orphanage.EraseForBlock()
inBlockConnected()
. - Previously, we held
g_cs_orphans
through the entirety ofProcessOrphanTx
, and now we are grabbing the lock for each call here (AddChildrenToWorkSet(*porphanTx, peer.m_id)
,EraseTx(orphanHash)
).
And this is fine. The "problem" would be if we called BlockConnected
and e.g. evicted an orphan that was just confirmed. It's fine since EraseTx()
handles if a tx is already gone, and AddChildrenToworkSet()
will wait on g_cs_orphans
for EraseForBlock()
to be done updating m_outpoint_to_orphan_it
.
*/ | ||
std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); | ||
CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to comment the same thing, this seems like a better interface given that we only want originator
/more
if there is a tx to work on, and generally avoiding in-out params is cleaner. (non-blocking)
|
||
while (CTransactionRef porphanTx = m_orphanage.GetTxToReconsider(peer.m_id, from_peer, more)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're touching this, I think it's quite strange that we do this in a while(!work_set.empty())
loop when we're trying to make sure that we only ever process 1 orphan at a time. Perhaps a remnant of pre-#15644 code? My suggestion is to take this out of the while loop and replace the break
s with returns... but non-blocking of course, can do in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's three "loops" I guess:
- ThreadMessageHandler repeatedly calls ProcessMessages calls ProcessMessage calls ProcessOrphanTx; if ProcessOrphanTx returns true, ProcessMessage exits early so the next run of the ThreadMessageHandler loop for this peer immediately goes back to ProcessOrphanTx
while (GetTxToReconsider)
here loops until it actually does some work (ProcessTransaction returns VALID or something other than TX_MISSING_INPUTS, returningtrue
if thewhile
loop could have continuedGetTxToReconsider
loops over the peer's workset, returning the first entry in the workset that still exists as an orphan
Removing the while (GetTxToReconsider)
loop would mean that we'd immediately switch to the next peer anytime we came across an orphan that needed two parents, rather than just one, despite that not taking much time to determine (it's part of MemPoolAccept::PreChecks). I don't think that would be wrong, but the way it is seems better to me.
Removing the internal loop for GetTxToReconsider
would mean you'd need to return a new result: "there's more work to do, but the first bit of work was a no-op", and you'd be re-querying m_peer_work_set.find(peer)
repeatedly.
c58c249 net_processing: indicate more work to do when orphans are ready to reconsider (Anthony Towns) ecb0a3e net_processing: Don't process tx after processing orphans (Anthony Towns) c583775 net_processing: only process orphans before messages (Anthony Towns) be23046 txorphange: Drop redundant originator arg from GetTxToReconsider (Anthony Towns) a4fe099 txorphanage: index workset by originating peer (Anthony Towns) Pull request description: We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to `ProcessMessage`. Based on #26295 ACKs for top commit: naumenkogs: utACK c58c249 glozow: ACK c58c249 mzumsande: Code Review ACK c58c249 Tree-SHA512: 3186c346f21e60440266a2a80a9d23d7b96071414e14b2b3bfe50457c04c18b1eab109c3d8c2a7726a6b10a2eda1f0512510a52c102da112820a26f5d96f12de
c58c249 net_processing: indicate more work to do when orphans are ready to reconsider (Anthony Towns) ecb0a3e net_processing: Don't process tx after processing orphans (Anthony Towns) c583775 net_processing: only process orphans before messages (Anthony Towns) be23046 txorphange: Drop redundant originator arg from GetTxToReconsider (Anthony Towns) a4fe099 txorphanage: index workset by originating peer (Anthony Towns) Pull request description: We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to `ProcessMessage`. Based on bitcoin#26295 ACKs for top commit: naumenkogs: utACK c58c249 glozow: ACK c58c249 mzumsande: Code Review ACK c58c249 Tree-SHA512: 3186c346f21e60440266a2a80a9d23d7b96071414e14b2b3bfe50457c04c18b1eab109c3d8c2a7726a6b10a2eda1f0512510a52c102da112820a26f5d96f12de
Moves extra transactions to be under the
m_msgproc_mutex
lock rather thang_cs_orphans
and refactors orphan handling so that the lock can be internal to theTxOrphange
class.