-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Thread names in logs and deadlock debug tools (take 2) #13168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4e2e4f6
to
7f3781f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits/comments on the first commit. Also:
- Why the rename from threadval to threadutil? Ideally introduce the file with the preferred name in the first commit, or at least do the rename in a separate commit from other changes.
- Naming convention (functions should be CamelCase).
- get_internal_id seems entirely unused.
src/threadval.cpp
Outdated
std::string m_name{"unnamed"}; | ||
}; | ||
|
||
#if defined(HAVE_THREAD_LOCAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use thread local even on mingw (where you had problems earlier?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/threadval.cpp
Outdated
|
||
std::string thread_data::get_internal_name() | ||
{ | ||
void* data = pthread_getspecific(g_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(in this function and the next) g_key
is uninitialized if called before set_internal_name
is invoked. Perhaps you need to add the pthread_once
call here as well?
If not, document that it's illegal to invoke these functions without knowing if set_internal_name
has been called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch.
src/threadval.cpp
Outdated
static_cast<thread_data_type*>(data)->m_name = std::move(name); | ||
return true; | ||
} | ||
data = new thread_data_type{internal_id++, std::move(name)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal_id
has different semantics here than in the thread_local case (here it's only incremented the first time a thread is given a name; in the above case it's incremented every time the name changes). Is that an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. The thread_local impl is incorrect. We only want to increment if unset, so thread id's don't bounce around across renames.
I figure "threadutil" might be a better name going forward because I anticipate as we continue to split stuff out of |
@jamesob I have no opinion about the name. I'm only complaining about a rename in the middle of a PR in a commit that also introduces other changes. |
src/threadutil.cpp
Outdated
{ | ||
auto name = g_thread_data.m_name; | ||
|
||
if (g_thread_data.m_name.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were lots of "<unnamed>"
occurrences in the logs and so I figured it made sense to fall back to the process name if we hadn't explicitly named the thread (e.g. bitcoind
, bitcoin-qt
). Do you think it makes more sense to just explicitly call rename
on those main threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see concerns with get_internal_name
being a misnomer. Maybe we should introduce a GetName
that first tries the internal name and then falls back to the process name?
@jamesob You can just take the threadval files, don't worry about the commit attribution. Agree with @sipa about the name change, it complicates the review a pretty good bit. Also no preference on naming. I originally named it "threadval" with the intention of making it a generic low-level interface for thread_local, but that's not worth the trouble. |
7f3781f
to
e64f526
Compare
@sipa @theuni Thanks for the quick look and comments. I've pushed a reworked version of this branch which
|
7afc8ad
to
7e61b8a
Compare
Pushed a small fix for a mysterious segfault which was happening on 2 x86_64 linux Travis jobs. Core dump and brief writeup in gist below. Build is now passing. |
What platforms still don't have thread_local? Can we just ifdef everything and upgrade our release build OSX SDK and get it in every release platform?
…On May 4, 2018 9:01:28 PM UTC, jamesob ***@***.***> wrote:
Replaces #13099.
The goal here is the same as the previous PR, but this approach uses
@theuni's gracefully degrading thread-local caching strategy to avoid
(when possible) map lookup and mutex overhead during thread name
lookup.
Instead of using a mutex-protected map to generate numeric suffixes
(e.g. `httpworker.2`) we explicitly pass in suffixes during thread
creation.
The logging format is also slightly different:
```
[bitcoind ] 2018-05-04T20:58:13Z nBestHeight = 1296342
[torcontrol ] 2018-05-04T20:58:13Z torcontrol thread start
[bitcoind ] 2018-05-04T20:58:13Z Bound to [::]:18333
[bitcoind ] 2018-05-04T20:58:13Z Bound to 0.0.0.0:18333
[bitcoind ] 2018-05-04T20:58:13Z init message: Loading P2P
addresses...
[loadblk ] 2018-05-04T20:58:13Z Leaving InitialBlockDownload
(latching to false)
[loadblk ] 2018-05-04T20:58:13Z Imported mempool transactions from
disk: 3 succeeded, 0 failed, 0 expired, 0 already there
[bitcoind ] 2018-05-04T20:58:13Z Loaded 19762 addresses from
peers.dat 102ms
[bitcoind ] 2018-05-04T20:58:13Z init message: Loading banlist...
[bitcoind ] 2018-05-04T20:58:13Z init message: Starting network
threads...
[net ] 2018-05-04T20:58:13Z net thread start
[addcon ] 2018-05-04T20:58:13Z addcon thread start
[dnsseed ] 2018-05-04T20:58:13Z dnsseed thread start
[msghand ] 2018-05-04T20:58:13Z msghand thread start
[bitcoind ] 2018-05-04T20:58:13Z init message: Done loading
```
You can view, comment on, or merge this pull request online at:
#13168
-- Commit Summary --
* threads: add threadval files
* threads: hook up threadval for logging
* threads: add numeric suffixes to threads with the same name
* tests: add threadutil naming tests
* threads: add thread names to deadlock debugging tools
-- File Changes --
M src/Makefile.am (2)
M src/Makefile.test.include (1)
M src/httpserver.cpp (11)
M src/init.cpp (7)
M src/logging.cpp (11)
M src/sync.cpp (14)
M src/test/test_bitcoin.cpp (2)
A src/test/threadutil_tests.cpp (90)
A src/threadutil.cpp (237)
A src/threadutil.h (62)
M src/util.cpp (21)
M src/util.h (6)
M src/validation.cpp (5)
M src/validation.h (8)
-- Patch Links --
https://github.com/bitcoin/bitcoin/pull/13168.patch
https://github.com/bitcoin/bitcoin/pull/13168.diff
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#13168
|
General comment: never @mention someone directly in a commit (just attribute them by name). They'll be spammed by every fork on github when those commits are merged by them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments.
Overall: why do we need LogPrint*Continued
? It should be implied by a lack of newline on the previous log statement (which is already used to determine whether to log the time).
src/threadutil.cpp
Outdated
void thread_util::Rename(std::string name) | ||
{ | ||
SetProcessName(name.c_str()); | ||
SetInternalName(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be SetInternalName(std::move(name))
.
src/threadutil.cpp
Outdated
bool thread_util::SetInternalName(std::string name) | ||
{ | ||
static std::atomic<long> id_accumulator{0}; | ||
static thread_local thread_id{id_accumulator++}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This increments id every time the name is set (regardless of whether it had a name already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the static thread_local
attribute prevent that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes indeed. Ignore me
src/threadutil.cpp
Outdated
static bool EnsureKeyCreated() | ||
{ | ||
static pthread_once_t key_once = PTHREAD_ONCE_INIT; | ||
return pthread_once(&key_once, make_key) ? false : true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be written as !pthread_once(&key_once, make_key);
.
src/sync.cpp
Outdated
{ | ||
mutexName = pszName; | ||
sourceFile = pszFile; | ||
sourceLine = nLine; | ||
fTry = fTryIn; | ||
thread_name = thread_name_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be written as thread_name = std::move(thread_name_)
to avoid a copy.
We need |
7e61b8a
to
b891a44
Compare
@sipa ah, sorry: I missed your point about the LogPrint macros. I see now that we can just move thread name attachment to reuse the existing |
@TheBlueMatt honestly I'm not sure about the thread_local stuff; all I know is that mingw32 wasn't behaving well with its use (as noted in the comment) - @theuni is the authority there. |
src/threadutil.cpp
Outdated
* | ||
* https://gist.github.com/jamesob/fe9a872051a88b2025b1aa37bfa98605 | ||
*/ | ||
#undef HAVE_THREAD_LOCAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line may be unintentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so - see the comment above; sounds like @theuni wants to do more testing with dynamic linking before we use thread_local
outside the context of debug code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if we should have the code in in that case, as it's untestable (it also doesn't compile currently).
src/threadutil.cpp
Outdated
#endif | ||
|
||
char threadname_buff[16]; | ||
char* pthreadname_buff = (char*)(&threadname_buff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this variable; using threadname_buff
will automatically decay to (char *)
.
b891a44
to
2d36119
Compare
Changed with more feedback from @sipa. |
@TheBlueMatt unfortunately it's not the SDK, it's Apple's clang itself that had thread_local patched out until xcode 9.0, released in June 2017. And to make it worse, they held off on supporting it because they "didn't want to commit to an inefficient abi". It's that reasoning that makes me believe that a build with an upstream clang may have compatibility issues with end-users' libc/libstdc++. A few tests might shed some light, but even then, I'd worry about one-off version issues. So, options are:
Those options ignore mingw, with which @jamesob has also reported some thread_local issues, but I haven't investigated those. @sipa I left the thread_local code but undef'd the variable in case we want to go for #3. Otherwise, I begrudgingly agree that it probably shouldn't be present until it's usable. |
src/threadutil.cpp
Outdated
bool thread_util::SetInternalName(std::string name) | ||
{ | ||
static std::atomic<long> id_accumulator{0}; | ||
static thread_local thread_id{id_accumulator++}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This declaration misses a type (when you uncomment the #undef HAVE_THREAD_LOCAL
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More evidence that we should remove this code until we need it. I'll strip it out.
2d36119
to
fe14a65
Compare
I've pushed changes removing the unused |
Needs rebase due to mege of #13236 |
@TheBlueMatt and @ryanofsky gave feedback offline that we should only include a (very simple) thread_local implementation for this change, undef |
See discussions here: - bitcoin#11722 (review) - bitcoin#13168 (comment)
I benchmarked this branch (as rebased on top of master) with bitcoinperf and it's worth noting that we see a roughly 1.4% slowdown over master (details here) during IBD up to height 200,000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we see a roughly 1.4% slowdown over master
What's the implication of this? Do you want to to try to optimize the change, or abandon it, or turn it off by default? Does the slowdown even happen in the default configuration, or only when extra debugging is enabled?
tACK 91aef70. I just ran bitcoind -regtest
and looked at ps thread names and log output. Only changes since my previous review were a rebase and tweaked thread names.
@ryanofsky thanks for the look. I think given the observed slowdown, maybe disabling by default is the way to go. I don't see any obvious means of improving performance, though I haven't taken to profiling tools. |
Needs rebase |
There hasn't been much activity lately and the patch still needs rebase, so I am closing this for now. Please let me know when you want to continue working on this, so the pull request can be re-opened. |
See discussions here: - bitcoin#11722 (review) - bitcoin#13168 (comment)
See discussions here: - bitcoin#11722 (review) - bitcoin#13168 (comment)
See discussions here: - bitcoin#11722 (review) - bitcoin#13168 (comment)
Note that this doesn't affect anything unless DEBUG_LOCKCONTENTION is defined. See discussions here: - bitcoin#11722 (review) - bitcoin#13168 (comment)
8722e54 threads: add thread names to deadlock debugging message (James O'Beirne) 383b186 threads: prefix log messages with thread names (James O'Beirne) ddd95cc tests: add threadutil tests (James O'Beirne) ae5f2b6 threads: introduce util/threadnames, refactor thread naming (James O'Beirne) 188ca75 disable HAVE_THREAD_LOCAL on unreliable platforms (James O'Beirne) Pull request description: I'm resurrecting this one (from bitcoin#13168) because I need it to make progress on bitcoin#15735. It's now off by default and can be turned on with `-logthreadnames=1`. Ran some benchmarks (IBD from local peer from 500_000 -> 504_000) and it's within spitting distance either on or off: ### threadnames off (default) #### 2018-05-threadnames.3 vs. master (absolute) | name | iterations | 2018-05-threadnames.3 | master | |------------------------------------------------|-----------:|----------------------------|----------------------------| | ibd.local.500000.504000.dbcache=2048 | 3 | 376.1584 (± 9.2944) | 392.3414 (± 13.4238) | | ibd.local.500000.504000.dbcache=2048.mem-usage | 3 | 2236117.3333 (± 1845.9623) | 2238690.6667 (± 2669.3487) | #### 2018-05-threadnames.3 vs. master (relative) | name | iterations | 2018-05-threadnames.3 | master | |------------------------------------------------|-----------:|----------------------:|-------:| | ibd.local.500000.504000.dbcache=2048 | 3 | 1 | 1.043 | | ibd.local.500000.504000.dbcache=2048.mem-usage | 3 | 1 | 1.001 | ### threadnames on #### 2018-05-threadnames-take-2 vs. master (absolute) | name | iterations | 2018-05-threadnames-take-2 | master | |------------------------------------------------|-----------:|----------------------------|----------------------------| | ibd.local.500000.504000.dbcache=2048 | 3 | 367.6861 (± 0.3941) | 364.1667 (± 0.9776) | | ibd.local.500000.504000.dbcache=2048.mem-usage | 3 | 2238461.3333 (± 3697.8730) | 2237014.6667 (± 3307.6966) | #### 2018-05-threadnames-take-2 vs. master (relative) | name | iterations | 2018-05-threadnames-take-2 | master | |------------------------------------------------|-----------:|---------------------------:|-------:| | ibd.local.500000.504000.dbcache=2048 | 3 | 1.010 | 1.00 | | ibd.local.500000.504000.dbcache=2048.mem-usage | 3 | 1.001 | 1.00 | ``` ACKs for commit 8722e5: Empact: utACK bitcoin@8722e54 jnewbery: utACK 8722e54 MarcoFalke: re-utACK 8722e54 (Only change since my previous review is DEFAULT_LOGTHREADNAMES=false and stylistic updates Tree-SHA512: 50af992708295b8d680cf10025262dd964e599a356bdfc1dfc84fb18c00afabcb34d3d12d551b0677ff81f8fccad0e17c1d5b24dfecb953a913bc77fdd1a4577
Note that this doesn't affect anything unless DEBUG_LOCKCONTENTION is defined. See discussions here: - bitcoin#11722 (review) - bitcoin#13168 (comment)
Note that this doesn't affect anything unless DEBUG_LOCKCONTENTION is defined. See discussions here: - bitcoin#11722 (review) - bitcoin#13168 (comment)
Note that this doesn't affect anything unless DEBUG_LOCKCONTENTION is defined. See discussions here: - bitcoin/bitcoin#11722 (review) - bitcoin/bitcoin#13168 (comment)
Removing "Up for grabs" given #15849. |
Replaces #13099.
The goal here is the same as the previous PR, but this approach only includes threadnames in logs when we can make use of
thread_local
. We specifically disableHAVE_THREAD_LOCAL
for platforms where its presence or implementation is questionable (Darwin, mingw32).Instead of using a mutex-protected map to generate numeric suffixes (e.g.
httpworker.2
) we explicitly pass in suffixes during thread creation.Under this changeset, logs look like this: