Skip to content

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Sep 20, 2020

This PR:

The example of the improved CheckLastCritical()/LEAVE_CRITICAL_SECTION() log (could be got when compiled without the last commit):

2020-09-20T08:34:28.429485Z [test] INCONSISTENT LOCK ORDER DETECTED
2020-09-20T08:34:28.429493Z [test] Current lock order (least recent first) is:
2020-09-20T08:34:28.429501Z [test]  'walletInstance->cs_wallet' in wallet/wallet.cpp:4007 (in thread 'test')
2020-09-20T08:34:28.429508Z [test]  'cs_wallets' in wallet/wallet.cpp:4089 (in thread 'test')

Currently, there are other "naked" LEAVE_CRITICAL_SECTION() in the code base:

LEAVE_CRITICAL_SECTION(cs_main);

LEAVE_CRITICAL_SECTION(pqueue->ControlMutex);

@hebasto hebasto force-pushed the 200920-leave-cs branch 2 times, most recently from 26fa3f6 to 5dc811f Compare September 20, 2020 09:58
@hebasto hebasto marked this pull request as ready for review September 20, 2020 10:32
@hebasto hebasto force-pushed the 200920-leave-cs branch 2 times, most recently from 186221d to 283ddf1 Compare September 20, 2020 12:37
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 20, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@promag
Copy link
Contributor

promag commented Sep 21, 2020

This change reveals a bug in the wallet_tests/CreateWalletFromFile test,
that will be fixed in the following commit.

@hebasto I'm not following, how is it revealed?

@hebasto
Copy link
Member Author

hebasto commented Sep 21, 2020

@promag

This change reveals a bug in the wallet_tests/CreateWalletFromFile test,
that will be fixed in the following commit.

@hebasto I'm not following, how is it revealed?

$ git checkout 7ac14ec58740226af85401f879f37f09755b990b
$ ./autogen.sh && ./configure --enable-debug && make clean && make
$ src/test/test_bitcoin --run_test=wallet_tests/CreateWalletFromFile
Running 1 test case...
wallet/test/wallet_tests.cpp:786 wallet->wallet()->cs_wallet was not most recent critical section locked, details in debug log.
unknown location(0): fatal error: in "wallet_tests/CreateWalletFromFile": signal: SIGABRT (application abort requested)
wallet/test/wallet_tests.cpp(785): last checkpoint

*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
^C
$ tail -4 '/tmp/test_common_Bitcoin Core/03a41ec8724b4ea8adc603b0dc139290a6f9c765607df0bc277823d7b0165c20/regtest/debug.log'
2020-09-21T09:16:36.739551Z [test] INCONSISTENT LOCK ORDER DETECTED
2020-09-21T09:16:36.739557Z [test] Current lock order (least recent first) is:
2020-09-21T09:16:36.739562Z [test]  'walletInstance->cs_wallet' in wallet/wallet.cpp:4007 (in thread 'test')
2020-09-21T09:16:36.739568Z [test]  'cs_wallets' in wallet/wallet.cpp:4089 (in thread 'test')

@promag
Copy link
Contributor

promag commented Sep 21, 2020

Thanks @hebasto, I tried make check instead. I'd suggest to add [skip ci] to that commit body

@hebasto
Copy link
Member Author

hebasto commented Sep 21, 2020

@promag

Thanks @hebasto, I tried make check instead.

$ make check
...
2020-09-21T09:41:56.850457Z [test] INCONSISTENT LOCK ORDER DETECTED
2020-09-21T09:41:56.850463Z [test] Current lock order (least recent first) is:
2020-09-21T09:41:56.850470Z [test]  'walletInstance->cs_wallet' in wallet/wallet.cpp:4007 (in thread 'test')
2020-09-21T09:41:56.850475Z [test]  'cs_wallets' in wallet/wallet.cpp:4089 (in thread 'test')
wallet/test/wallet_tests.cpp:786 wallet->wallet()->cs_wallet was not most recent critical section locked, details in debug log.
make[3]: *** [Makefile:20887: wallet/test/wallet_tests.cpp.test] Error 1
make[3]: Leaving directory '/home/hebasto/guix/GitHub/bitcoin/src'
make[2]: *** [Makefile:19327: check-am] Error 2
make[2]: Leaving directory '/home/hebasto/guix/GitHub/bitcoin/src'
make[1]: *** [Makefile:19012: check-recursive] Error 1
make[1]: Leaving directory '/home/hebasto/guix/GitHub/bitcoin/src'
make: *** [Makefile:798: check-recursive] Error 1

@hebasto
Copy link
Member Author

hebasto commented Sep 21, 2020

Updated 283ddf1 -> d89c332 (pr19982.03 -> pr19982.04, diff):

I'd suggest to add [skip ci] to that commit body

@promag
Copy link
Contributor

promag commented Sep 21, 2020

Oh I did't have --enable-debug.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK d89c332, feel free to take comments, thanks for fixing this.

AFAICT this deadlock wasn't catch by CI at #16426 merge because LEAVE_CRITICAL didn't call neither potential_deadlock_detected or CheckLastCritical. Otherwise if a new lock would have been taken in the rest of CreateWalletFromFile it should have been detected.

So this PR does improve deadlock detection by flagging any lock order inconsistency even if it didn't materialize in an effective one as this case sounds to suggest ?

LEAVE_CRITICAL_SECTION(wallet->wallet()->cs_wallet);
SyncWithValidationInterfaceQueue();
ENTER_CRITICAL_SECTION(wallet->wallet()->cs_wallet);
ENTER_CRITICAL_SECTION(cs_wallets);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I looked on, before this commit, behavior is :

  • TestLoadWallet() in src/wallet/test/wallet_tests.cpp L794.
  • Calling down CWallet::Create() which takes the cs_wallet lock in src/wallet/wallet.cpp L4007 then cs_wallets lock in src/wallet/wallet.cpp L4089
  • Calling down this handler, which attempt to leave the former first triggering a lock order inconsistency

For tracking the faultive commit is 6a72f26.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK d89c332. Impressive work by hebasto not only debugging and fixing the buggy wallet test, but also improving deadlock detection, and improving deadlock detection output, and writing new deadlock detection tests!

@hebasto if you feel like adding more information to #19049, I'd be curious to know more you debugged this, since I gave up there struggling to figure out which locks were involved

@ryanofsky
Copy link
Contributor

Note to anyone else interested in reviewing this that the change is simpler than it probably looks.

The actual bugfix in the third commit is very straightforward and just two lines. The first two commits are just improvements to error handling and debug output in lock order code, ignoring a big whitespace diff in the first commit.

@hebasto
Copy link
Member Author

hebasto commented Oct 27, 2020

@ryanofsky

@hebasto if you feel like adding more information to #19049, I'd be curious to know more you debugged this, since I gave up there struggling to figure out which locks were involved

Our own lock stack being printed at some points appears very useful for debugging.

@hebasto
Copy link
Member Author

hebasto commented Nov 14, 2020

Rebased d89c332 -> e2dc75f (pr19982.04 -> pr19982.05) due to the conflict with #20285.

Also updated comments as requested by @ariard.

@hebasto hebasto changed the title test: Fix inconsistent lock order in wallet_tests/CreateWalletFromFile test: Fix inconsistent lock order in wallet_tests/CreateWallet Nov 28, 2020
@hebasto
Copy link
Member Author

hebasto commented Nov 28, 2020

Rebased d15646b -> f2933e7 (pr19982.06 -> pr19982.07) due to the conflict with #19337.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK f2933e7. Since last review just rebase updating test name and tweaking comments

@ryanofsky
Copy link
Contributor

Notes that might be helpful to other reviewers: #19982 (comment).

The actual fix is very simple and contained in the third commit. The other two commits are error checking improvements.

@fanquake fanquake requested review from ajtowns, ariard and vasild December 3, 2020 02:06
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK f2933e7

lock m1
lock m2
unlock m1
unlock m2

the above was allowed before this PR and is forbidden after the PR. Such code cannot cause deadlock or any other issue and is legit. However it looks strange, may be a signal of a problem elsewhere and better be avoided. In our code in particular it would also brick the debugging mechanisms because LeaveCritical() from unlock m1 would remove the wrong lock from the lock stack (m2).

I think it is ok to enforce "locks should be released in reverse order in which they were acquired"

@hebasto
Copy link
Member Author

hebasto commented Dec 3, 2020

Updated f2933e7 -> 608e14a (pr19982.07 -> pr19982.08, diff):

@practicalswift
Copy link
Contributor

Concept ACK

Nice work!

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK 608e14a.

As expected make check fails on 79bfd2e with ./configure --enable-debug.

@vasild
Copy link
Contributor

vasild commented Dec 3, 2020

ACK 608e14a

This commit adds actual lock stack logging if check fails.
This change reveals a bug in the wallet_tests/CreateWalletFromFile test,
that will be fixed in the following commit.
@hebasto
Copy link
Member Author

hebasto commented Dec 10, 2020

Rebased 608e14a -> e1e68b6 (pr19982.08 -> pr19982.09) due to the conflict with #19425.

@maflcko
Copy link
Member

maflcko commented Dec 10, 2020

review ACK e1e68b6 💂

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

review ACK e1e68b6305beb47ebf7ee48f14e12fdebdfea1ef 💂
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjlrAv/QIaj8D90XXyl/5Sw/WeG+cGB2diICRF1Pbv4qvEvG7sr+ETsScHeJvK2
TVlXBWujZcYo50jpPMl9j8XcnLR9MfnT0GYTwOi+z+fcXmi4NULQRqx6JrNQS0Cb
cZQXp/2BNo1cMBEKmUmfnG221SzZ+rKuhiJXBte3JprpIhQyJTMvSAwW1yoVwwfJ
P4OQp0gvW+kOt1+Gy3DK7u9DLN848blddAvwqKaH276oemN92iClv7O9YUJAs4qS
gWLnyFGOMhUICJVOKYzneSAWBM+xRC/Ujx1pE1V18uv+F9lqjZIS95V2OLEHu8CQ
OBRHx1LqruQOHvpS/SC3317KCv68QJtCuiz254G5zxGFsMBAl9wGY2XZjBYdiazD
7GTUVSXzVwa474uVJRPVnuLvKLF0IgqDGrNQo2t/x1kW+8qJ23UpuKOhJ+1nOReG
6la8BU2ng9UKcvlwJ6PqFN/0A/GlLeNzfGPGxT1OzRtyu/4wrFeDvPxvLd5jXyJx
Ea6eO6Vm
=A3Z6
-----END PGP SIGNATURE-----

Timestamp of file with hash b9199fb93572c3373d95d9880b18d20bf1295e74f013c4bf476dda6306523525 -

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK e1e68b6. Just trivial rebase and suggested switch to BOOST_CHECK_EXCEPTION since last review

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK e1e68b6

@fanquake fanquake merged commit 736eb4d into bitcoin:master Dec 11, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 11, 2020
…s/CreateWallet

e1e68b6 test: Fix inconsistent lock order in wallet_tests/CreateWallet (Hennadii Stepanov)
cb23fe0 [skip ci] sync: Check precondition in LEAVE_CRITICAL_SECTION() macro (Hennadii Stepanov)
c5e3e74 sync: Improve CheckLastCritical() (Hennadii Stepanov)

Pull request description:

  This PR:
  - fixes bitcoin#19049 that was caused by bitcoin#16426
  - removes `wallet_tests::CreateWallet` suppression from the `test/sanitizer_suppressions/tsan`

  The example of the improved `CheckLastCritical()`/`LEAVE_CRITICAL_SECTION()` log (could be got when compiled without the last commit):
  ```
  2020-09-20T08:34:28.429485Z [test] INCONSISTENT LOCK ORDER DETECTED
  2020-09-20T08:34:28.429493Z [test] Current lock order (least recent first) is:
  2020-09-20T08:34:28.429501Z [test]  'walletInstance->cs_wallet' in wallet/wallet.cpp:4007 (in thread 'test')
  2020-09-20T08:34:28.429508Z [test]  'cs_wallets' in wallet/wallet.cpp:4089 (in thread 'test')
  ```

  Currently, there are other "naked" `LEAVE_CRITICAL_SECTION()` in the code base:
  https://github.com/bitcoin/bitcoin/blob/b99a1633b270e0e89479b2bb2ae19a8a8dc0fa05/src/rpc/mining.cpp#L698

  https://github.com/bitcoin/bitcoin/blob/b99a1633b270e0e89479b2bb2ae19a8a8dc0fa05/src/checkqueue.h#L208

ACKs for top commit:
  MarcoFalke:
    review ACK e1e68b6 💂
  ryanofsky:
    Code review ACK e1e68b6. Just trivial rebase and suggested switch to BOOST_CHECK_EXCEPTION since last review
  vasild:
    ACK e1e68b6

Tree-SHA512: a627680eac3af4b4c02772473d68322ce8d3811bf6b035d3485ccc97d35755bef933cffabd3f20b126f89e3301eccecec3f769df34415fb7c426c967b6ce36e6
@hebasto hebasto deleted the 200920-leave-cs branch December 11, 2020 07:31
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test: Potential deadlock in wallet_tests/CreateWalletFromFile
9 participants