-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Make mininode_lock non-reentrant #19178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make mininode_lock non-reentrant #19178
Conversation
There's no need for mininode_lock to be reentrant. Use a simpler non-recursive lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, I just learned a lot about locks haha. I have a comment about timeouts because opening up the potential for deadlock scares me 🥶
@@ -492,7 +492,7 @@ def test_function(): | |||
# P2PConnection acquires this lock whenever delivering a message to a P2PInterface. | |||
# This lock should be acquired in the thread running the test logic to synchronize | |||
# access to any data shared with the P2PInterface or P2PConnection. | |||
mininode_lock = threading.RLock() | |||
mininode_lock = threading.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like Lock
will try to acquire indefinitely by default (I'm looking at these docs).
I tried out some re-acquiring behavior with this code and it hangs forever as expected... Just in case someone like me does some dumb deadlocky stuff, would it be a good idea to make mininode_lock
use something likelock.acquire(timeout=60)
and/or if lock.locked(): raise_assertion_error
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I have a question because I'm not 100% sure how the test_runner works - does it allot a certain amount of time for each test? It looks like it would run infinitely? How would it affect Travis if 1 test is in deadlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Travis is running our ci system, which calls the test_runner helper:
DOCKER_EXEC LD_LIBRARY_PATH=$DEPENDS_DIR/$HOST/lib ${TEST_RUNNER_ENV} test/functional/test_runner.py --ci $MAKEJOBS --tmpdirprefix "${BASE_SCRATCH_DIR}/test_runner/" --ansi --combinedlogslen=4000 ${TEST_RUNNER_EXTRA} --quiet --failfast
If a test takes a long time, it will show up in the log as Remaining jobs: [foo_bar.py]
.
If a test never finished, the virtual machine that the ci system runs on will eventually reboot and mark the run as failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error
We use the with <lock>:
syntax everywhere, which doesn't allow a blocking
or timeout
argument. I don't think there's a way to set a default time on a lock.
I think if you did accidentally introduce a deadlock, you'd figure it out pretty easily. If your script was hanging, I think the best way would be to implement some way to attach pdb to the running process (eg https://stackoverflow.com/a/25329467/933705 or https://stackoverflow.com/a/147114/933705) and then inspect the stack trace.
(in fact, I think this would be generally useful, so perhaps someone should implement it 🙂 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this makes sense, thanks! I was worried I could accidentally stall Travis hahaha. Having some debugging help sounds useful, let me go look for a someone...
@@ -658,8 +658,6 @@ def on_inv(self, message): | |||
# save txid | |||
self.tx_invs_received[i.hash] += 1 | |||
|
|||
super().on_inv(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think this was from Amiti + me both adding it in separate PRs.
I wonder if we should actually change the Lock to be a Conditional Variable: https://docs.python.org/3/library/threading.html#condition-objects. A lot of what we do with the lock is polling for changes on objects, so having a notify() function that wakes the blocked thread could be much nicer. |
Concept ACK. Seems like a good first step for future cleanups. (e.g. making the lock a private member and then replacing the polling loops with cvs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, I learned more about how locks are built into the Bitcoin testing framework too. I'm not familiar with whether polling is a resource consuming activity/busy waiting in the current implementation, but in general use cases I know CV's are a good replacement.
@@ -658,8 +658,6 @@ def on_inv(self, message): | |||
# save txid | |||
self.tx_invs_received[i.hash] += 1 | |||
|
|||
super().on_inv(message) | |||
|
|||
def get_invs(self): | |||
with mininode_lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused on why get_invs
needs to grab the lock, but on_inv
doesn't? I understand that in the case of wait_for_broadcast
, self.wait_until
already acquires the lock, but I just wanted to confirm that on_inv
already expects to be holding the lock on call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. All of the on_<msgtype>()
already hold the lock, since they're called by the on_message()
method (specifically, the getattr(self, 'on_' + msgtype)(message)
line)
Shower thought: The lock could even be removed completely without a cv replacement if all code that previously required the lock was executed in the network thread. Not sure if this makes sense at all, so consider it a "shower thought". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 6206838
ACK 6206838 😃 Show signature and timestampSignature:
Timestamp of file with hash |
6206838 [tests] Make mininode_lock non-reentrant (John Newbery) c67c1f2 [tests] Don't call super twice in P2PTxInvStore.on_inv() (John Newbery) 9d80762 [tests] Don't acquire mininode_lock twice in wait_for_broadcast() (John Newbery) edae607 [tests] Only acquire lock once in p2p_compactblocks.py (John Newbery) Pull request description: There's no need for mininode_lock to be reentrant. Use a simpler non-recursive lock. ACKs for top commit: MarcoFalke: ACK 6206838 😃 jonatack: ACK 6206838 Tree-SHA512: dcbc19e6c986970051705789be0ff7bec70c69cf76d5b468c2ba4cb732883ad512b1de5c3206c2eca41fa3f1c4806999df4cabbf67fc3c463bb817458e59a19c
Summary: This is a backport of Core [[bitcoin/bitcoin#18807 | PR18807]] [1/9] bitcoin/bitcoin@dab298d Note: the 3rd commit of this PR will not be backported, as it added a duplicate line later removed in [[bitcoin/bitcoin#19178 | PR19178]] Test Plan: proof-reading Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D9028
Summary: bitcoin/bitcoin@edae607 [tests] Only acquire lock once in p2p_compactblocks.py bitcoin/bitcoin@9d80762 [tests] Don't acquire mininode_lock twice in wait_for_broadcast() bitcoin/bitcoin@c67c1f2 [tests] Make mininode_lock non-reentrant Commit c67c1f2c032a8efa141d776a7e5be58f052159ea removing a double call to the parent method in `P2PTxInvStore.on_inv` is not relevant for Bitcoin ABC, as it fixes a bug that was never introduced in our codebase. We skipped that commit while backporting core#18807. To make it work, the following additional change is needed for Bitcoin ABC: Don't acquire mininode_lock twice in abc_p2p_avalanche.py. `on_avahello`, `on_avapoll` and `on_avaresponse` are called indirectly via `mininode.P2PInterface.on_message`, whit the lock already acquired. This is a backport of [[bitcoin/bitcoin#19178 | core#19178]] Test Plan: `ninja check-functional` Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Subscribers: deadalnix Differential Revision: https://reviews.bitcoinabc.org/D9430
There's no need for mininode_lock to be reentrant.
Use a simpler non-recursive lock.