-
Notifications
You must be signed in to change notification settings - Fork 37.7k
refactor: Guard TxRequestTracker by its own lock instead of cs_main #26151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Guard TxRequestTracker by its own lock instead of cs_main #26151
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK. Originally suggested in #19988 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concept ack - restarted the *SAN job as that was an unrelated apt failure.
7f9a893
to
f2f498a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's to be a separate lock, I'm not convinced it makes sense to push the lock all the way into the Impl
class -- having m_impl GUARDED_BY(m_impl_mutex)
in TxRequestTracker
, or having m_txrequest GUARDED_BY(m_txrequest_mutex)
seem more sensible to me, and the latter allows you to take the lock outside of the various for loops, rather than repeatedly taking it and dropping it on each iteration.
I think the arguments from the original PR still apply though -- this doesn't meaningfully allow any more parallelism / reduce blocking as far as I can see.
Thanks @jnewbery for linking the previous discussion! I was looking through the old PRs yesterday and didn't come across that comment.
@ajtowns I'm not sure if I am on the same page about this but I am willing to be convinced. I used
I have been looking at reducing |
f2f498a
to
2ac77c4
Compare
In general, I think it's better to leave things alone if we're not making things substantially more reliable/simple/efficient. For locking, I think the best approach on all of those axes is "having a reference to an object means you can manipulate it; if you can't manipulate it, you don't have a reference to in the first place"; but that's often hard to achieve. Maybe compare with what @vasild's proposing in #25390 -- with that you'd just say
Yeah... Kind of feel like it's probably better spending coding/review time on the hard steps though? For |
Concept ACK. |
It makes me think that such an issue should be resolved at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 2ac77c4
This PR adds an internal lock and acquires it inside the methods that touch the relevant variables. This would allow finer grained control - e.g. to lock the mutex only for some part of a method, to improve concurrency. In this PR, however, we don't do that - we lock the mutex for the entire duration of the methods, which is the same as having an external mutex outside, like @ajtowns mentioned above:
... or having
m_txrequest GUARDED_BY(m_txrequest_mutex)
... allows you to take the lock outside of the various for loops, rather than repeatedly taking it and dropping it on each iteration.
Which way is better would depend on how this is used by multiple threads. Because it is not, both approaches look equally good (or equally bad) now.
Thanks, @ajtowns, for mentioning #25390. This PR can be achieved by just:
- TxRequestTracker m_txrequest;
+ Synced<TxRequestTracker> m_txrequest;
and mechanically replacing .
with ->
(we don't need to hold the lock across multiple method calls). That is virtually one line of change. See this comment: #25390 (comment) looks like this PR is doing the "Lots of repetitions" case.
Could also move it out of the cs_main scope in |
m_txrequest is now guarded by TxRequestTracker's internal mutex.
2ac77c4
to
4ec6b41
Compare
Rebased, added a method to remove multiple transactions from the tracker at once (to avoid locking the internal lock over and over again), and renamed |
4ec6b41
to
1d72d42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
} | ||
|
||
m_txrequest.ForgetTxs(Span{pblock->vtx}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
m_txrequest.ForgetTxs(Span{pblock->vtx}); | |
m_txrequest.ForgetTxs(pblock->vtx); |
I am not sure about it. In LOCK(cs_main);
...
m_txrequest.DisconnectedPeer(nodeid);
...
assert(m_txrequest.Size() == 0); hmm, wait! is that a bug in this PR? If somebody modifies (adds to) Maybe return the size from |
I don't think it is currently but it almost is! The I will still change this as that seems a little brittle wrt future changes. |
How? It seems that the assumption in Yes, the current code is fine, but then it is fine even without this PR. The aim is to make it future proof. Returning the size from Maybe just delete the assert? Or expose the mutex of |
@@ -1395,7 +1395,8 @@ void PeerManagerImpl::PushNodeVersion(CNode& pnode, const Peer& peer) | |||
|
|||
void PeerManagerImpl::AddTxAnnouncement(const CNode& node, const GenTxid& gtxid, std::chrono::microseconds current_time) | |||
{ | |||
AssertLockHeld(::cs_main); // For m_txrequest | |||
AssertLockHeld(::cs_main); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is correct. While m_txrequest
has an internal mutex to guard against UB, the internal mutex does nothing to guard the processing logic. A mutex different from the internal one is still needed here to guard against several threads calling into this function at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think you're right. Also seems similar to what Vasil mentioned, when we call methods on m_txrequest
but expect the internal state between those calls not to change, then the internal mutex won't help us. afaict the changes here don't break anything because cs_main
is still held in these places but I am trying to seperate m_txrequest
from cs_main
so cs_main
should not be needed for any of this after the PR. Will mark as draft until I figure out how to address this. (Seems like the only way to address this is to change the TxRequestTracker
interface)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an alternative would be to replace cs_main with g_msgproc_mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or expose the mutex outside of TxRequestTracker
and lock for longer duration in the net processing code. I.e. have TxRequestTracker m_txrequest GUARDED_BY(m_its_own_mutex);
as suggested by @ajtowns in #26151 (review)
🐙 This pull request conflicts with the target branch and needs rebase. |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closing this for now, can be marked up for grabs. #26151 (comment) needs to be addressed for this to move forward. Imo, the interface of |
I don't see the need to have the
TxRequestTracker
guarded bycs_main
which would also be more in line with our developer docs.From
developer-notes.md
:This PR gives
TxRequestTracker
its own mutex, thereby removing the need to guardPeerManagerImpl::m_txrequest
usingcs_main
.