-
Notifications
You must be signed in to change notification settings - Fork 37.7k
sync, log: inline lock contention logging macro to fix duration, improve BCLog::LogMsg() #22904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync, log: inline lock contention logging macro to fix duration, improve BCLog::LogMsg() #22904
Conversation
Co-authored-by: Martin Ankerl <martin.ankerl@gmail.com>
- make timer code more homogeneous - replace division with multiplication - log if the time type is unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 498b323, code review & tested. Found two one potential improvements.
e9af7c2
to
f530202
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f530202
Thanks for catching the timing flaw @martinus (and also sorry to all for not noticing it in my review of #22736)! Indeed the numbers look way more plausible now -- on a signet non-debug build (src/bitcoind -signet -debug=lock
), lock contention times are now in the range of hundreds to thousands micro-seconds on my machine.
src/test/logging_tests.cpp
Outdated
SetMockTime(1); | ||
auto minute_timer = BCLog::Timer<std::chrono::minutes>("tests", "end_msg"); | ||
SetMockTime(2); | ||
BOOST_CHECK_EQUAL(minute_timer.LogMsg("test minutes"), "Error: unexpected time type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit odd to add a new feature in one commit and then remove it in the next commit.
If you squash the two commits and keep the ordering, the diff will be a lot smaller, if not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept the last three commits separate as they do three different things and were contributed by three different people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK f530202. I ran a fully synced node for about a day. My node was mostly idle though so not much was going on. I wrote a little script to parse the
Almost all of the contention in |
I expect lock contention to be more pronounced in the GUI (or when there is an active RPC poll) |
…to fix duration, improve BCLog::LogMsg() f530202 Make unexpected time type in BCLog::LogMsg() a compile-time error (Martin Ankerl) bddae7e Add util/types.h with ALWAYS_FALSE template (MarcoFalke) 498b323 log, timer: improve BCLog::LogMsg() (Jon Atack) 8d2f847 sync: inline lock contention logging macro to fix time duration (Jon Atack) Pull request description: Follow-up to bitcoin#22736. The first commit addresses the issue identified and reported by Martin Ankerl in bitcoin#22736 (comment) to fix the lock contention duration reporting. The next three commits make improvements to the timer code in `BCLog::LogMsg()` and add `util/types.h` with an `ALWAYS_FALSE` template, that springboard from bitcoin#22736 (comment) by Marco Falke. ACKs for top commit: martinus: re-ACK f530202. I ran a fully synced node for about a day. My node was mostly idle though so not much was going on. I [wrote a little script](https://github.com/martinus/bitcoin-stuff/blob/main/scripts/parse-debuglog-contention-single.rb) to parse the `debug.log` and summarize the output to see if anything interesting was going on, here is the result: theStack: ACK f530202 Tree-SHA512: 37d093eac5590e1b5846ab5994d0950d71e131177d1afe4a5f7fcd614270f977e0ea117e7af788e9a74ddcccab35b42ec8fa4db3a3378940d4988df7d21cdaaa
This seems to introduce a busy loop when running the checkqueue tests for me:
It does finish eventually with a >1GB log file:
I was thinking that the fact that logging takes its own lock might have been a problem (leading to contention on that lock? or just taking more time leading to more contention on other locks?) but it seems like there really is that many instances of contention for the checkqueue mutex -- reducing the logging by a factor of 1000 only reduces the logging proportionally. But gigabyte log files and many minutes to run a simple test doesn't seem great... |
I suspect you have a bunch of spurious failures. Regardless of whether that's the case or not, relying on unspecified behavior (spurious mutex failures) from libc/stdlib is a bad idea, especially if it leads to possible remote triggers (3 of the 4 most contended locks are in net_processing). IMO this is far too dangerous to have turned on by default. |
Following up: if this were to be reverted, here is an alternative solution to the header macro problem for lock contention and lock ordering. As a caveat, I haven't benchmarked to see the impact of the lost inlining. |
I think the contentions are real and not spurious. For me the file ./src/test/test_bitcoin --run_test=checkqueue_tests/test_CheckQueue_Correct_Random -- DEBUG_LOG_OUT >out.txt I get a 1.1GB log file. Parsing this with with this script I get this output:
the first 4 threads have almost the same number of contentions, so it seems to me that they are blocking each other. Especially the 3 |
Clearly this needs following up on. I've opened an issue, #23167, for further discussion / so this doesn't get lost. |
Good call @martinus. Testing this further, it was not only the case on every run but also pre-existing with Opened #23223 to resolve. |
…TION preprocessor directive 4394733 Add DEBUG_LOCKCONTENTION documentation to the developer notes (Jon Atack) 39a34b6 Put lock logging behind DEBUG_LOCKCONTENTION preprocessor directive (Jon Atack) Pull request description: This is a more minimal, no-frills version of #24734 for backport. The other fixes and improvements in that pull can be done after. *Copy of the PR 24734 description:* PRs #22736, #22904 and #23223 changed lock contention logging from a `DEBUG_LOCKCONTENTION` compile-time preprocessor directive to a runtime `lock` log category and improved the logging output. This changed the locking from using `lock()` to `try_lock()`: - `void Mutex::UniqueLock::lock()` acquires the mutex and blocks until it gains access to it - `bool Mutex::UniqueLock::try_lock()` doesn't block but instead immediately returns whether it acquired the mutex; it may be used by `lock()` internally as part of the deadlock-avoidance algorithm In theory the cost of `try_lock` might be essentially the [same](https://www.erisian.com.au/bitcoin-core-dev/log-2022-03-31.html#l-697) relative to `lock`. The test-and-set logic of these calls is purported to be ~ constant time, optimised and light/quick if used carefully (i.e. no mutex convoying), compared to system calls, memory/cache coherency and fences, wait queues, and (particularly) lock contentions. See the discussion around bitcoin/bitcoin#22736 (comment) and after with respect to performance/cost aspects. However, there are reasonable concerns (see [here](bitcoin/bitcoin#22736 (comment)) and [here](https://www.erisian.com.au/bitcoin-core-dev/log-2022-03-31.html#l-620)) that `Base::try_lock()` may be potentially [costly](https://www.erisian.com.au/bitcoin-core-dev/log-2022-03-31.html#l-700) or [risky](bitcoin/bitcoin#22904 (comment)) compared to `Base::lock()` in this very frequently called code. One alternative to keep the run-time lock logging would be to gate the `try_lock` call behind the logging conditional, for example as proposed in bitcoin/bitcoin@ccd73de and ACKed [here](bitcoin/bitcoin#22736 (comment)). However, this would add the [cost](bitcoin/bitcoin#22736 (comment)) of `if (LogAcceptCategory(BCLog::LOCK))` to the hotspot, instead of replacing `lock` with `try_lock`, for the most frequent happy path (non-contention). It turns out we can keep the advantages of the runtime lock contention logging (the ability to turn it on/off at runtime) while out of prudence putting the `try_lock()` call and `lock` logging category behind a `DEBUG_LOCKCONTENTION` compile-time preprocessor directive, and also still retain the lock logging enhancements of the mentioned PRs, as suggested in bitcoin/bitcoin#24734 (comment) by W. J. van der Laan, in bitcoin/bitcoin#22736 (comment), and in the linked IRC discussion. Proposed here and for backport to v23. ACKs for top commit: laanwj: Code review ACK 4394733 Tree-SHA512: 89b1271cae1dca0eb251914b1a60fc5b68320aab4a3939c57eec3a33a3c8f01688f05d95dfc31f91d71a6ed80cfe2d67b77ff14742611cc206175e47b2e5d3b1
Summary: This diff: - adds a LOCK logging category if DEBUG_LOCKCONTENTION is defined at compilation. - adds a timing macro in microseconds, `LOG_TIME_MICROS_WITH_CATEGORY` - updates `BCLog::LogMsg()` to omit irrelevant decimals for microseconds and skip unneeded code and math. - improves the lock contention logging, drops the all-caps, and displays the duration in microseconds - makes unexpected time type in `BCLog::LogMsg()` a compile-time error - add a paragraph of documentation to developer-notes.md Lock contention logging in checkqueue_tests is disabled, as some of these tests are designed to be heavily contested to trigger race conditions or other issues. This created very large log files when run with DEBUG_LOCKCONTENTION defined. This is a backport of [[bitcoin/bitcoin#22736 | core#22736]], [[bitcoin/bitcoin#22904 | core#22904]], [[bitcoin/bitcoin#23223 | core#23223]] and [[bitcoin/bitcoin#24770 | core#24770]] The first pull request implements the lock category, the other pull requests are important follow-ups (mixed with minor refactorings) fixing various regressions (bugs and suspected performance issues). Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com> Co-authored-by: practicalswift <practicalswift@users.noreply.github.com> Co-authored-by: Martin Ankerl <martin.ankerl@gmail.com> Test Plan: `ninja all check-all` Follow the instructions in the new documentation and check that the logging works: ``` cmake .. -GNinja -DCMAKE_CXX_FLAGS="-DDEBUG_LOCKCONTENTION" ninja src/bitcoind -h | grep -A8 "debug=<category>" src/bitcoind -debug=lock ``` Reviewers: #bitcoin_abc, sdulfari Reviewed By: #bitcoin_abc, sdulfari Subscribers: sdulfari Differential Revision: https://reviews.bitcoinabc.org/D12049
Follow-up to #22736.
The first commit addresses the issue identified and reported by Martin Ankerl in #22736 (comment) to fix the lock contention duration reporting.
The next three commits make improvements to the timer code in
BCLog::LogMsg()
and addutil/types.h
with anALWAYS_FALSE
template, that springboard from #22736 (comment) by Marco Falke.