-
Notifications
You must be signed in to change notification settings - Fork 37.8k
Add two interruption points in dnsseed thread to make shutdown in early ... #4215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ly stage quickly Signed-off-by: Huang Le <4tarhl@gmail.com>
untested ACK, code changes OK |
untested ACK |
I did the testing locally, any thing I need to do for pushing the commit? |
… responsiveness of net threads, typically the dnsseed thread Signed-off-by: Huang Le <4tarhl@gmail.com>
By using async getaddrinfo_a() instead of getaddrinfo() in netbase.cpp::LookupIntern(), the responsiveness issue of dnsseed thread is finally resolved. And to note: With the 2nd commit in this pull request, the 1st one is not necessary now, although leaving it there would cause no problem. |
…ript, and modify corresponding conditional directive in netbase.cpp to ensure we can use the lib properly. Signed-off-by: Huang Le <4tarhl@gmail.com>
Sorry, the previous building error was caused by missing the libanl, added it in last commit. Have done a clean rebuilding and testing locally, please kindly have a review. Thanks. |
@@ -367,6 +367,7 @@ if test x$TARGET_OS = xdarwin; then | |||
fi | |||
|
|||
AC_CHECK_HEADERS([stdio.h stdlib.h unistd.h strings.h sys/types.h sys/stat.h]) | |||
AC_CHECK_LIB(anl, getaddrinfo_a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tie this check to the function and not glibc's usage, in case other libc's (uclibc, musl, etc) implement it as part of libc itself:
AC_SEARCH_LIBS([getaddrinfo_a], [anl], [AC_DEFINE(HAVE_GETADDRINFO_A, 1,[Define this symbol if you have getaddrinfo_a])])
Then in code, use: #if defined(HAVE_GETADDRINFO_A)
…anl, for better compatibility with other libc implementations different from glibc Signed-off-by: Huang Le <4tarhl@gmail.com>
@theuni Thanks, done as you suggested in last commit. |
struct addrinfo *aiRes = NULL; | ||
#if defined(_GNU_SOURCE) && defined(HAVE_GETADDRINFO_A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the defined(_GNU_SOURCE)
necessary here? (i.e. isn't HAVE_GETADDRINFO_A in itself enough to know that the function exists?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it's unnecessary in my dev environment, but I just added it for if someone was building with an older toolchain, although such a toolchain might fail to get HAVE_GETADDRINFO_A defined by configure script at the first place, but just in case.
Sure it should be pretty safe to remove the macro definition check, I'd like to do that if you think that is desired, but it could be even safer (ok, quite little) to leave it there:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laanwj Yes, I consider the situation for a while and come to agree on your point: we should remove the _GNU_SOURCE macro definition checking because someone might build it with another libc implementation, and in such a case, the _GNU_SOURCE was unlikely defined by the compiler but the getaddrinfo_a() could have been implemented in the c library itself.
Will submit a commit immediately. Thanks for your kind review!
1. The _GNU_SOURCE macro definition checking in netbase.cpp, the HAVE_GETADDRINFO_A is enough for using getaddrinfo_a(); 2. The interruption points added in net.cpp::ThreadDNSAddressSeed(), using getaddrinfo_a() instead of getaddrinfo() is enough to make the dnsseed thread responsive. Signed-off-by: Huang Le <4tarhl@gmail.com>
@laanwj Done. Hope it is the last commit for this pull request. :) |
Does this launch a new thread for each resolve? If so, I think it would be necessary to do a quick probe at random times to see what the thread count looks like before/after this change. If it's a drastic change, this will need some sort of rate-limiting. Also (not necessary for this PR), it might be beneficial to maintain a global resolve list in some class so that a single tight loop can be checking the status of all resolves simultaneously, rather than each resolve getting its own loop. From the examples of getaddrinfo_a in the man pages, it seems that's how it is meant to be used. |
@theuni Yes, the getaddrinfo_a() call does kick off a new thread for resolving the name and poll on it in its own thread. So when there is one resolving job, there would be one more thread than before this change. However, the real name resolving job happens mostly, if not only, in dnsseed thread. To all other calls to the LookupIntern() function, they are generally kicked off by the constructor of CNetAddr which actually just want to convert a string address into a network address structure, and this kind of job, which is not a real name resolving one, can be done without getaddrinfo(), but a lighter inet_aton(). By detecting this case, we can dramatically reduce the overhead of getaddrinfo(), and in general there should be only one more thread and only when dnsseed thread is running. Will submit a patch for this PR according to the above soon. For the global resolver idea, one thing need to pointed out is that it doesn't reduce the thread count but add one (the global resolver itself) instead, while it is extremely helpful for the throughput if there were lots of real async name resolving jobs kicked off simultaneously and the kickers don't need the result immediately. But as mentioned above, the dnsseed seems the only one needs the result of a real name resolving, and it works in a sync manner, so the global resolver doesn't help here. Actually why we switch to a async getaddrinfo_a() is just for responsiveness purpose but not throughput. We can absolutely rewrite the dnsseed thread to let it work in an async manner and improve its throughput, but given it's only running for several minutes, or tens of minutes if the network situation is really bad, before exiting, it doesn't worth the effort I guess. Or maybe later if we figure out the effort is really needed. |
In LookupIntern(), the getaddrinfo_a() call would be issued if it's avaiable, which in turn would kick off a new thread for async name resolving job. In our code, most calls go to LookupIntern() are just trying to convert a text ip address into a struct but not a real name resolving request, so we should do that first to avoid extra thread overhead. Signed-off-by: Huang Le <4tarhl@gmail.com>
Signed-off-by: Huang Le <4tarhl@gmail.com>
@theuni, @4tar This is starting to be too complicated. Don't get too deep in this rabbit hole. I would accept this simple patch, but I don't want to merge a whole thread/job management system just so that interruption can happen sooner. We already have boost::asio for that, if really necessary. No need to replicate its functionality. But is the number of threads created by the async resolve really that bad that we need some complex system to work around it? Are we using the resolver this intensively at all? Does it create DoS risk? |
Interesting, looks like the i586-mingw32msvc-g++ building env doesn't have inet_pton() defined. Will submit a patch to AC_SEARCH_LIB for it. |
Looks like mingw env doesn't define this function yet, so we have to search it first to avoid breaking building in mingw env. Signed-off-by: Huang Le <4tarhl@gmail.com>
@laanwj I agree with you that we don't really need an async resolver now, so I just add a inet_pton()/inet_addr() call in the LookupIntern() before issuing the getaddrinfo_a(), so that we can avoid kicking off a new resolving thread inside the latter, when the caller just want to convert an ip addr text to an ip addr structure, which, as we have discussed above, is the most cases that the LookupIntern() get called. By doing this, we have already removed the possible thread count problems. Hope I have cleared mingw env building error in last commit... |
@4tar: I'm not very familiar with the networking code, so I'll give your analysis the benefit of the doubt. Thanks for addressing my thread-count concern. As for inet_pton, please add "nsl resolv" to the list of libs where it may be found (the empty 2nd param). |
@theuni Thanks for your review! Setting up a correct lib list is always a mystery to me... |
Signed-off-by: Huang Le <4tarhl@gmail.com>
Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/978d2881b00fbc98789b64e8ed2ec4bd2e8fadc5 for binaries and test log. |
@laanwj Anything I can do to make this PR be merged a little bit soon? It passed the test and I believe it is ready. The reason I'm asking is that I made a mistake to commit directly on my master branch and send the PR, so the pending causes big issue on my branch management... |
OK, I'm closing this RP and will re-submit it from another branch... |
...stage quickly
The dnsseed thread is starting by default and takes quick some time, if the user wants to stop the bitcoind by Ctrl-C in command line or a kill in daemon mode when the dnsseed is still working, s/he would have to wait for it to finish all the job, in such situation, an extra 'kill -9' would be expected.
By adding two interruption points just before the time-consuming operations we give the dnsseed thread a chance to realize the user's intention and quit its job much quickly, usually in seconds instead of several minutes or more depends on network situation.
Signed-off-by: Huang Le 4tarhl@gmail.com