Skip to content

Conversation

jnewbery
Copy link
Contributor

@jnewbery jnewbery commented Jun 5, 2020

There's no need for mininode_lock to be reentrant.
Use a simpler non-recursive lock.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, I just learned a lot about locks haha. I have a comment about timeouts because opening up the potential for deadlock scares me 🥶

@@ -492,7 +492,7 @@ def test_function():
# P2PConnection acquires this lock whenever delivering a message to a P2PInterface.
# This lock should be acquired in the thread running the test logic to synchronize
# access to any data shared with the P2PInterface or P2PConnection.
mininode_lock = threading.RLock()
mininode_lock = threading.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like Lock will try to acquire indefinitely by default (I'm looking at these docs).
I tried out some re-acquiring behavior with this code and it hangs forever as expected... Just in case someone like me does some dumb deadlocky stuff, would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have a question because I'm not 100% sure how the test_runner works - does it allot a certain amount of time for each test? It looks like it would run infinitely? How would it affect Travis if 1 test is in deadlock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis is running our ci system, which calls the test_runner helper:

  DOCKER_EXEC LD_LIBRARY_PATH=$DEPENDS_DIR/$HOST/lib ${TEST_RUNNER_ENV} test/functional/test_runner.py --ci $MAKEJOBS --tmpdirprefix "${BASE_SCRATCH_DIR}/test_runner/" --ansi --combinedlogslen=4000 ${TEST_RUNNER_EXTRA} --quiet --failfast

If a test takes a long time, it will show up in the log as Remaining jobs: [foo_bar.py].

If a test never finished, the virtual machine that the ci system runs on will eventually reboot and mark the run as failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error

We use the with <lock>: syntax everywhere, which doesn't allow a blocking or timeout argument. I don't think there's a way to set a default time on a lock.

I think if you did accidentally introduce a deadlock, you'd figure it out pretty easily. If your script was hanging, I think the best way would be to implement some way to attach pdb to the running process (eg https://stackoverflow.com/a/25329467/933705 or https://stackoverflow.com/a/147114/933705) and then inspect the stack trace.

(in fact, I think this would be generally useful, so perhaps someone should implement it 🙂 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this makes sense, thanks! I was worried I could accidentally stall Travis hahaha. Having some debugging help sounds useful, let me go look for a someone...

@@ -658,8 +658,6 @@ def on_inv(self, message):
# save txid
self.tx_invs_received[i.hash] += 1

super().on_inv(message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think this was from Amiti + me both adding it in separate PRs.

@jnewbery
Copy link
Contributor Author

I wonder if we should actually change the Lock to be a Conditional Variable: https://docs.python.org/3/library/threading.html#condition-objects. A lot of what we do with the lock is polling for changes on objects, so having a notify() function that wakes the blocked thread could be much nicer.

@maflcko
Copy link
Member

maflcko commented Jun 11, 2020

Concept ACK. Seems like a good first step for future cleanups. (e.g. making the lock a private member and then replacing the polling loops with cvs)

Copy link

@gillichu gillichu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, I learned more about how locks are built into the Bitcoin testing framework too. I'm not familiar with whether polling is a resource consuming activity/busy waiting in the current implementation, but in general use cases I know CV's are a good replacement.

@@ -658,8 +658,6 @@ def on_inv(self, message):
# save txid
self.tx_invs_received[i.hash] += 1

super().on_inv(message)

def get_invs(self):
with mininode_lock:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused on why get_invs needs to grab the lock, but on_inv doesn't? I understand that in the case of wait_for_broadcast, self.wait_until already acquires the lock, but I just wanted to confirm that on_inv already expects to be holding the lock on call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. All of the on_<msgtype>() already hold the lock, since they're called by the on_message() method (specifically, the getattr(self, 'on_' + msgtype)(message) line)

@maflcko
Copy link
Member

maflcko commented Jun 15, 2020

Shower thought: The lock could even be removed completely without a cv replacement if all code that previously required the lock was executed in the network thread. Not sure if this makes sense at all, so consider it a "shower thought".

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 6206838

@maflcko
Copy link
Member

maflcko commented Jun 16, 2020

ACK 6206838 😃

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 62068381a3b9c065d81300be79abba7aecfdb41b 😃
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUg1zgv/Yt3dwdeZi7IRInaHKprMrQbGGBB/MvgXliI7NbdZZmVGTcONWCPMXWDA
NRPTCD2oP4scNp2uRwFYr3WGXo8XrHQImByajBJWrgMxwiPkzHGYN0VHh9WUJiQt
4XQyQhtr6nNg3T9dPvTyM0aUblItMkh17dcnQoynH7FGSh7njbhD9m+Nr0yJFMaX
pMeiIyMTbOIao2VNX1htUF5XN0twFjHNvFKHlHTyJxSyK/85vd2UPYo1eCzgBQwa
Pq9Ax5/MUtHnD5lucuthH4IHC7HfGxoJ7FpQGTZWPTDH7LA9HPn5z1D+JUXf7b3t
zOCvVNm1DSvBG47EcPEHDIBLvhT4AhNJOd+APWN4NTV7y2AiGZwufipvWbPuBjpS
J2YTw57Y7CIbos3O8fGDuA0bNAR1mYNwBrOvyqXtxQaS6KV0WKecCtRbFrxoAxu5
565+kQoGp4sjp3KAEalyPVu+td9orvKCejNRI5MloMcG55EKQ+3zLtQgGPoVBVx/
5vm8xtfy
=YJKb
-----END PGP SIGNATURE-----

Timestamp of file with hash 236b7f082f677ae84f62c6b2ec461c0d23883d147cfecbaf72127c0e22631b33 -

@maflcko maflcko merged commit 46bdd4b into bitcoin:master Jun 16, 2020
@jnewbery jnewbery deleted the 2020-05-mininode-lock-reentrancy branch June 16, 2020 14:44
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 7, 2020
6206838 [tests] Make mininode_lock non-reentrant (John Newbery)
c67c1f2 [tests] Don't call super twice in P2PTxInvStore.on_inv() (John Newbery)
9d80762 [tests] Don't acquire mininode_lock twice in wait_for_broadcast() (John Newbery)
edae607 [tests] Only acquire lock once in p2p_compactblocks.py (John Newbery)

Pull request description:

  There's no need for mininode_lock to be reentrant.
  Use a simpler non-recursive lock.

ACKs for top commit:
  MarcoFalke:
    ACK 6206838 😃
  jonatack:
    ACK 6206838

Tree-SHA512: dcbc19e6c986970051705789be0ff7bec70c69cf76d5b468c2ba4cb732883ad512b1de5c3206c2eca41fa3f1c4806999df4cabbf67fc3c463bb817458e59a19c
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 22, 2021
Summary:
This is a backport of Core [[bitcoin/bitcoin#18807 | PR18807]] [1/9]
bitcoin/bitcoin@dab298d

Note: the 3rd commit of this PR will not be backported, as it added a duplicate line later removed in [[bitcoin/bitcoin#19178 | PR19178]]

Test Plan: proof-reading

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D9028
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 23, 2021
Summary:
bitcoin/bitcoin@edae607
[tests] Only acquire lock once in p2p_compactblocks.py

bitcoin/bitcoin@9d80762
[tests] Don't acquire mininode_lock twice in wait_for_broadcast()

bitcoin/bitcoin@c67c1f2
[tests] Make mininode_lock non-reentrant

Commit c67c1f2c032a8efa141d776a7e5be58f052159ea removing a double call to the parent method in `P2PTxInvStore.on_inv` is not relevant for Bitcoin ABC, as it fixes a bug that was never introduced in our codebase. We skipped that commit while backporting core#18807.

To make it work, the following additional change is needed for Bitcoin ABC:
Don't acquire mininode_lock twice in abc_p2p_avalanche.py. `on_avahello`, `on_avapoll` and `on_avaresponse` are called indirectly via `mininode.P2PInterface.on_message`, whit the lock already acquired.

This is a backport of [[bitcoin/bitcoin#19178 | core#19178]]

Test Plan: `ninja check-functional`

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Subscribers: deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D9430
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants