Skip to content

Conversation

jamesob
Copy link
Contributor

@jamesob jamesob commented Apr 26, 2018

While trying to debug the apparent deadlock revealed in #12873, I realized it'd be nice to have the POTENTIAL DEADLOCK DETECTED message display which thread acquired each lock. I also think it'd be generally useful to have log lines display the name of the originating thread.

This changeset does both of those things by introducing a class which manages thread naming, ThreadNameRegistry. The class abstracts process-control calls responsible for thread naming and provides automatic number suffixing to threads which use the same name (e.g. httpworker.0, httpworker.1, ...).

With this patch, thread names look like this

 $ pstree -p `pgrep bitcoind`

bitcoind(3415)─┬─{addcon}(3465)
               ├─{httpworker.0}(3454)
               ├─{httpworker.1}(3455)
               ├─{httpworker.2}(3456)
               ├─{httpworker.3}(3457)
               ├─{http}(3453)
               ├─{msghand}(3467)
               ├─{net}(3463)
               ├─{opencon}(3466)
               ├─{scheduler}(3452)
               ├─{scriptch.0}(3445)
               ├─{scriptch.1}(3446)
               ├─{scriptch.2}(3447)
               ├─{scriptch.3}(3448)
               ├─{scriptch.4}(3449)
               ├─{scriptch.5}(3450)
               ├─{scriptch.6}(3451)
               └─{torcontrol}(3462)

and the debug log looks like this

2018-04-26T21:54:23Z [bitcoind] init message: Loading wallet...
2018-04-26T21:54:24Z [bitcoind] mapBlockIndex.size() = 1
2018-04-26T21:54:24Z [bitcoind] nBestHeight = 0
2018-04-26T21:54:24Z [loadblk] Imported mempool transactions from disk: 0 succeeded, 0 failed, 0 expired, 0 already there
2018-04-26T21:54:24Z [torcontrol] torcontrol thread start
2018-04-26T21:54:24Z [bitcoind] Bound to [::]:18444
2018-04-26T21:54:24Z [torcontrol] tor: Error connecting to Tor control socket
2018-04-26T21:54:24Z [torcontrol] tor: Not connected to Tor control port 127.0.0.1:9051, trying to reconnect
2018-04-26T21:54:24Z [bitcoind] Bound to 0.0.0.0:18444
2018-04-26T21:54:24Z [bitcoind] init message: Loading P2P addresses...
2018-04-26T21:54:24Z [bitcoind] Loaded 0 addresses from peers.dat  0ms
2018-04-26T21:54:24Z [dnsseed] dnsseed thread start
2018-04-26T21:54:24Z [bitcoind] init message: Done loading
2018-04-26T21:54:24Z [addcon] addcon thread start
2018-04-26T21:54:24Z [dnsseed] Loading addresses from DNS seeds (could take a while)
2018-04-26T21:54:24Z [net] net thread start
2018-04-26T21:54:24Z [dnsseed] 0 addresses found from DNS seeds
2018-04-26T21:54:24Z [dnsseed] dnsseed thread exit
2018-04-26T21:54:24Z [msghand] msghand thread start
2018-04-26T21:54:24Z [opencon] opencon thread start
2018-04-26T21:54:25Z [torcontrol] tor: Error connecting to Tor control socket
2018-04-26T21:54:25Z [torcontrol] tor: Not connected to Tor control port 127.0.0.1:9051, trying to reconnect

Note that child thread names have changed; I'm no longer using the bitcoin- prefix. Because we're limited to 16 characters for thread name (on Linux, anyway), that prefix was causing the numeric suffix some names have to be hidden. If it's desirable to keep the prefix, I can revert this change.

Implementation considerations

A basic version of this change could be done pretty trivially with something like

thread_local std::string threadname;

but per @theuni (#11722 (review)), we can't rely on having thread_local. Also note that with a basic implementation like that, we wouldn't be able to do the automatic numbering scheme to differentiate between threads with the same basename (e.g. httpworker, scriptch).

We could also rely solely on thread-related system calls. I don't like this much either because of how poorly defined or unavailable getname is on some platforms, e.g. OS X, FreeBSD; not to mention the 16 character limit.

Tests

If this gets a Concept ACK or two, I'll implement some. Unittests attached.


std::unique_ptr<ThreadNameRegistry> g_thread_name_registry(new ThreadNameRegistry());

namespace {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there somewhere more general or useful I should put the two functions in this namespace?

std::string process_name(name);

/*
* Uncomment if we want to retain the `bitcoin-` system thread name prefix.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or uncomment this block before merge.

@jamesob jamesob force-pushed the 2018-04-26-use-threadnames branch 4 times, most recently from 957d3aa to fcfba12 Compare April 27, 2018 01:10
Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.


const std::string& ThreadNameRegistry::GetThreadName()
{
auto thread_id = boost::this_thread::get_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should lock mutex.

}
}

void ThreadNameRegistry::RenameProcessThread(const char* name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a static function or a function in a anonymous namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument could be const std::string& name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to unittest the *Process* functions eventually, so maybe I should just make them public.

#endif
}

std::string ThreadNameRegistry::GetProcessThreadName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a static function or a function in a anonymous namespace.

return found->second;
} else {
auto insert_ret = m_id_to_name.emplace(thread_id, GetProcessThreadName());
return insert_ret.first->second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoud also add to m_name_to_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

class ThreadNameRegistry
{
public:
const std::string& GetThreadName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return reference?

@skeees
Copy link
Contributor

skeees commented Apr 27, 2018

Thoughts on pulling the TraceThread class out of util into the new files you're creating here? Seems like a good home for tracethread and would reduce the clutter in util

@practicalswift
Copy link
Contributor

Concept ACK

@jamesob jamesob force-pushed the 2018-04-26-use-threadnames branch from fcfba12 to 8379557 Compare April 27, 2018 16:30
@jamesob
Copy link
Contributor Author

jamesob commented Apr 27, 2018

@promag thanks for the review; I've pushed changes incorporating your feedback.

@skeees I like that idea but maybe we can save it for a future PR? That entails changing a bunch of call sites and I'm worried about introducing a circular dependency between util and threadnames (by way of logging).

@TheBlueMatt
Copy link
Contributor

How does the use of boost::thread::id interact with eventual boost::thread removal? IIRC there was something about no guarantees being made that you can interact with boost::thread if the thread was started by std, though it usually works in practice. We do have a HAVE_THREAD_LOCAL defined, which should be set everywhere but OSX, so maybe we could just turn it on there if boost::thread::id is an issue?

private:
std::mutex m_map_lock;
std::map<boost::thread::id, std::string> m_id_to_name;
std::map<std::string, boost::thread::id> m_name_to_id;
Copy link
Contributor

@conscott conscott Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit, but maybe m_name_to_id could instead be a map<string, int> - where it just stores the occurrences of that name/prefix, then you don't have to call CountMapPrefix, but can just update the count each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the look and great suggestion. I think maintaining a map of counts will also allow me to remove all of the utility functions in the anonymous namespace - nice catch!

@conscott
Copy link
Contributor

Concept ACK- just a comment on how the maps are used.

@jamesob jamesob force-pushed the 2018-04-26-use-threadnames branch 2 times, most recently from 1d1ce9a to bddf05a Compare May 1, 2018 18:34
@jamesob
Copy link
Contributor Author

jamesob commented May 1, 2018

@TheBlueMatt good points, thanks for looking. I've pushed changes which

  • only make use of boost::thread when thread_local is not available. When it is available, we assign each thread a thread_local-cached ID based on a static counter.
  • incorporate @conscott's suggestion of using m_name_to_count which nicely cuts out a bunch of code.
  • add tests.

(jamesob/threadnames.1 -> jamesob/threadnames.3)

This changeset is ready for review-for-merge IMO.

@sipa
Copy link
Member

sipa commented May 1, 2018

@theuni What's the status of thread_local on OSX? I remember seeing https://stackoverflow.com/questions/28094794/why-does-apple-clang-disallow-c11-thread-local-when-official-clang-supports, but which Xcode version's patches does our clang run with, or...?

@jamesob jamesob force-pushed the 2018-04-26-use-threadnames branch from bddf05a to c065728 Compare May 1, 2018 18:56
@jamesob
Copy link
Contributor Author

jamesob commented May 1, 2018

I've pushed a small change removing GetProcessName support for *BSD systems. After double-checking pthread_getname support for OpenBSD, I found that it doesn't seem to exist. As a result, any threads which are not explicitly named with Rename() will be labeled "<unknown>" on BSD systems.

 $ diff -uw <(git diff master..threadnames.3) <(git diff master..threadnames.4)

--- /proc/self/fd/12    2018-05-01 14:56:59.561313752 -0400
+++ /proc/self/fd/13    2018-05-01 14:56:59.549313568 -0400
@@ -2215,10 +2215,10 @@
      void reset();
 diff --git a/src/threadnames.cpp b/src/threadnames.cpp
 new file mode 100644
-index 0000000..305c3b4
+index 0000000..92e3f75
 --- /dev/null
 +++ b/src/threadnames.cpp
-@@ -0,0 +1,135 @@
+@@ -0,0 +1,133 @@
 +// Copyright (c) 2018 The Bitcoin Core developers
 +// Distributed under the MIT software license, see the accompanying
 +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
@@ -2345,8 +2345,6 @@
 +
 +#if defined(PR_GET_NAME)
 +    ::prctl(PR_GET_NAME, pthreadname_buff);
-+#elif (defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__DragonFly__))
-+    pthread_get_name_np(pthread_self(), pthreadname_buff);
 +#elif defined(MAC_OSX)
 +    pthread_getname_np(pthread_self(), pthreadname_buff, sizeof(threadname_buff));
 +#else

@jamesob jamesob force-pushed the 2018-04-26-use-threadnames branch 7 times, most recently from 75e9902 to 66fcbdc Compare May 2, 2018 21:26
@jamesob
Copy link
Contributor Author

jamesob commented May 2, 2018

After spending the day figuring out that thread_local is apparently broken on mingw32 compilers, I'm now falling back to boost thread IDs for WIN32. Tests are passing and this is once again ready for review.

@jamesob jamesob force-pushed the 2018-04-26-use-threadnames branch from 66fcbdc to 911f54e Compare May 3, 2018 13:11
jamesob added 3 commits May 3, 2018 09:13
Abstracts system thread name setting and tracks thread names in memory. Allows
automatic thread numbering for reused names (e.g. http.0, http.1, ...).

NB: as-written, this removes the `bitcoin-` prefix from child threads.

Incorporates feedback from @conscott @TheBlueMatt @sipa.
@jamesob jamesob force-pushed the 2018-04-26-use-threadnames branch from 911f54e to b09e25b Compare May 3, 2018 13:14
@jamesob
Copy link
Contributor Author

jamesob commented May 3, 2018

Per @sipa's advice, I'm now using std::thread to determine thread IDs for all platforms instead of either boost or the ham-fisted thread_local code.

jamesob/threadnames.11 -> jamesob/threadnames.13

@theuni
Copy link
Member

theuni commented May 3, 2018

If platform-specific functions are being used anyway, why not just use pthread_getspecific/pthread_setspecific, which are meant for exactly this? That also makes it a trivial transition when we're finally ready to use thread_local.

@jamesob
Copy link
Contributor Author

jamesob commented May 3, 2018

@theuni what are you talking about replacing at this point? We're now platform agnostic with std::thread, and it seems to work fine with boost threads. Any reason you can see that it won't work?

@theuni
Copy link
Member

theuni commented May 3, 2018

@jamesob See the top two commits here for a quick attempt: https://github.com/theuni/bitcoin/commits/thread_names

That addresses a few things that make me hesitate about this PR:

  • system thread id/name are conflated with internal use. This drags in system requirements (name length, for ex) and overhead for internal functions.
  • Heavy map lookup with locking. logging: bypass timestamp formatting when not logging #12970 demonstrated how painful overhead in logging can be.
  • Most can go away once we can use thread_local unconditionally.

I realize that my changes are oversimplified, it's just a quick POC for discussion.

@jamesob
Copy link
Contributor Author

jamesob commented May 3, 2018

@theuni I think your simple implementation looks great. Very much agree with your concern re: map lookup and locking overhead, and I think there's a way to combine our approaches to avoid it. Let me know if this sounds right:

IMO lock acquisition during the single Rename() call per thread is acceptable, but we want to avoid any subsequent locking during GetName() (since it is called per logline). So: during Rename() we can use Registry's count map for numeric suffix generation and then use your method for caching the name of the thread for use during GetName().

If that sounds good to you, I'll cherry-pick your two commits onto this branch and reorganize accordingly. Otherwise if you think the numeric suffix isn't worthwhile, I'm happy to close this out and proceed on your commits only.

@theuni
Copy link
Member

theuni commented May 3, 2018

@jamesob Looking again, I think handling the suffix at this layer is unnecessary and adds a significant amount of complication. Why not just add a suffix before passing the name into TraceThread() ?

@jamesob
Copy link
Contributor Author

jamesob commented May 3, 2018

@theuni ugh, you're right, that's dumb. Let's go off of your commits.

@jamesob jamesob closed this May 3, 2018
@theuni
Copy link
Member

theuni commented May 3, 2018

From IRC:

<cfields> jamesob: noo!
<cfields> jamesob: I hope you don't think I'm being NIH about that PR, that wasn't my intention at all. Coding up an alternative myself helps me to understand the challenges better, it wasn't meant to replace your work.

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants