-
Notifications
You must be signed in to change notification settings - Fork 37.7k
init: Signal-safe instant shutdown #20605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
67fd692
to
f8585d5
Compare
That's an amazing hack. I didn't know it was allowed to call system library functions from a signal handler, but apparently POSIX mandates that some functions are callable - including read() and write(). |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Nice! Is there a documentation what system function one can call in signal handler (to confirm with POSIX)? |
This is a good question, I wasn't able to find any definitive reference. The "self-pipe trick" is an incredibly common pattern in UNIX daemons though. It certainly works across Linux, Macos, modern BSDs… Fairly sure everything that can run Bitcoin Core. If it would be broken in principle then a lot of software would break. For example Python uses this too, calls it a 'wakeup fd'. See https://github.com/python/cpython/blob/master/Modules/signalmodule.c#L294 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
// Write an arbitrary byte to the write end of the shutdown pipe. | ||
const char token = 'x'; | ||
while (true) { | ||
int result = write(g_shutdown_pipe[1], &token, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried locally to just close()
and it also works (tested on macos only). Note that WaitForShutdown
is called twice so just one read
succeeds, see promag@c44a33c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with "WaitForShutdown is called twice"? It isn't. It's called at the end of AppInit
which itself is called once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if you mean the call in AbortShutdown
, sure, yes but that's on purpose to reset the shutdown condition so it can be used again.
I'd really prefer to use read
/write
and not close a socket under someone waiting on it !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, the second call is In AbortShutdown
. But that doesn't invalidate the point of just close()
instead of write()
, which looks enough to make read()
return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"looks enough"
Yes, but we're dealing with a lot of possibilities in different OSes here, just because it works for you doesn't mean it's safe. This whole thing (try googling about signals and pipe tricks) is a controversial subject, so I prefer to mimic existing code instead of inventing our own.
Also, close()
makes it more work to re-arm the condition in AbortShutdown
.
Edit: To be fair I do think it's a clever trick to use close
, and it might be good in some circumstances, but probably not in this case.
Tested ACK eae384f on Debian 5.9.11-1 (2020-11-27) x86_64 GNU/Linux |
Tested ACK eae384f on macos 10.15.6, needs squash I guess. |
eae384f
to
f628398
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested re-ACK f628398 on Debian 5.9.11-1 (2020-11-27) x86_64 GNU/Linux
Timed runs of a short rpc tests compared to master, they appear faster, and manual bitcoind shutdowns seem snappier.
Concept ACK ❤️ |
Concept ACK: swift shutdown is more practical than non-swift shutdown |
But Windows does have signals...? |
But not asynchronous ones like in POSIX. Windows always uses event loops. Edit: okay this is apparently not true, some notifications launch in a separate thread. It seems they are not reentrant in the same sense as POSIX signals though. |
Edit: we only set a |
I suggest reopening, the approach here looks correct for Windows: https://docs.microsoft.com/en-us/windows/console/console-control-handlers (the control handlers are run in a separate thread). |
re: bitcoin#20605 (comment) > This only affects bitcoind. The GUI is unaffected by this change, and keeps polling as before in `BitcoinGUI::detectShutdown()`. It might be possible to listen to a pipe there, too, but I'm not sure, and it's complicated by the GUI-node abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK f628398. Seems like a good approach. Could be cleaned up various ways (could be wrapped in a util class, could close file descriptors before shutdown, could handle more runtime errors and avoid asserts) but none of this seems important. I have a question below about AbortShutdown that would be good to clear up, but at worst I think may be a minor bug there, and wouldn't want to hold up the PR.
re: #20605 (comment)
This only affects bitcoind. The GUI is unaffected by this change, and keeps polling as before in
BitcoinGUI::detectShutdown()
. It might be possible to listen to a pipe there, too, but I'm not sure, and it's complicated by the GUI-node abstraction.
The GUI is already asynchronous so shouldn't need any polling/piping/condition variable stuff. Just some signals boilerplate like 73be44d (branch)
src/shutdown.cpp
Outdated
/** On UNIX-like operating systems use the self-pipe trick. | ||
* Index 0 will be the read end of the pipe, index 1 the write end. | ||
*/ | ||
static int g_shutdown_pipe[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "init: Signal-safe instant shutdown" (f628398)
Could set some initial values:
static int g_shutdown_pipe[2] = {-1, -1};
To avoid possibility of code reading and writing to random pipes if someone forgets to call InitShutdownState.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
(0
is STDIN_FILENO so it should be more or less safe, but still, it'd be surprising behavior, it's better to raise an error)
if (fRequestShutdown) { | ||
// Cancel existing shutdown by waiting for it, this will reset condition flags and remove | ||
// the shutdown token from the pipe. | ||
WaitForShutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "init: Signal-safe instant shutdown" (f628398)
I don't understand what AbortShutdown is supposed to do. Why would a caller request and abort a shutdown instead of just not requesting a shutdown until it wanted the shutdown? What prevents AppInit from exiting between the StartShutdown and AbortShutdown calls and causing AbortShutdown to hang forever? I could be misunderstanding, but it seems like AbortShutdown functionality was racy before this PR, and now racy and deadlocky after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to see it go but it's a) used in the tests b) before WaitForShutdown() is called in init. Both uses are safe, and I mention this in the comment (in the header file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #20605 (comment)
I'm happy to see it go but it's a) used in the tests b) before WaitForShutdown() is called in init. Both uses are safe, and I mention this in the comment (in the header file).
Thanks for pointing out the comment. I missed that this was meant to be called before WaitForShutdown, so this makes much more sense now!
Wouldn't raising a Qt signal also, likely, be unsafe in a signal handler?
How do you suggest handling them? I used asserts because the program is in an unknown state if any of the conditions happen, it should only happen due to a coding bug or memory corruption, and there's not a sane way to continue, e.g. either we spend forever waiting for shutdown, or the shutdown sequence runs prematurely (thinking of it, the latter is safe? then again it would be unexpected, at least with an assert it's easier to see where something went wrong). |
Replace the 200ms polling loop with a faster and more efficient waiting operation. This was tried a few times before, but given up every time because solutions use a condition variable which is not safe for use in signals as they need to be reentrant. On UNIX-ish OSes, use a safe way: a pipe. When shutdown is requested write a dummy byte to the pipe. Waiting for shutdown is a matter of a blocking read from the pipe. On Windows, there are no signals so using a condition variable is safe.
f628398
to
cd03513
Compare
ACK cd03513 tested on Debian 5.9.11-1 (2020-11-27) x86_64 GNU/Linux When I read |
Heh, it could definitely be done without that, but I tend to write retry loops (especially those with complicated exit conditions, though that's not the case here) as infinite loops usually. Here it avoids having to define |
cd03513 init: Signal-safe instant shutdown (Wladimir J. van der Laan) Pull request description: Replace the 200ms polling loop with a faster and more efficient waiting operation. This should speed up short RPC tests. This change has been tried a few times before, but abandoned every time because solutions used a condition variable which is not safe for use in signals, as they need to be reentrant. On UNIX-ish OSes, use a safe way: a pipe. When shutdown is requested write a dummy byte to the pipe. Waiting for shutdown is a matter of a blocking read from the pipe. On Windows, there are no signals so using a condition variable is safe. This only affects bitcoind. The GUI is unaffected by this change, and keeps polling as before in `BitcoinGUI::detectShutdown()`. It might be possible to listen to a pipe there, too, but I'm not sure, and it's complicated by the GUI-node abstraction. ACKs for top commit: jonatack: ACK cd03513 tested on Debian 5.9.11-1 (2020-11-27) x86_64 GNU/Linux Tree-SHA512: ed2f532f69fec4855c17bf7b8f3d0eb96e78ee2a3c13d374dd2c6add06e3ad6a190da8ed8f9d7a76532cf998222d67f57e35b206aec29675e96437448ae7e13c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge code review ACK cd03513. Only change since last review initializing descriptor values.
re: #20605 (comment)
Wouldn't raising a Qt signal also, likely, be unsafe in a signal handler?
Oh! You're completely right according to https://doc.qt.io/qt-5/unix-signals.html. So do need a thread. This can be added to previous implementation with small tweak: b6b5674 (pr/guishut.2
)
I think potentional good followups for this PR would be:
- Removing polling from GUI. Sketched above.
- Moving platform specific pipe / condition variable code into an RAII util class that closes file descriptors and can be unit tested
if (fRequestShutdown) { | ||
// Cancel existing shutdown by waiting for it, this will reset condition flags and remove | ||
// the shutdown token from the pipe. | ||
WaitForShutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #20605 (comment)
I'm happy to see it go but it's a) used in the tests b) before WaitForShutdown() is called in init. Both uses are safe, and I mention this in the comment (in the header file).
Thanks for pointing out the comment. I missed that this was meant to be called before WaitForShutdown, so this makes much more sense now!
Replace the 200ms polling loop with a faster and more efficient waiting operation. This should speed up short RPC tests.
This change has been tried a few times before, but abandoned every time because solutions used a condition variable which is not safe for use in signals, as they need to be reentrant.
On UNIX-ish OSes, use a safe way: a pipe. When shutdown is requested write a dummy byte to the pipe. Waiting for shutdown is a matter of a blocking read from the pipe.
On Windows, there are no signals so using a condition variable is safe.
This only affects bitcoind. The GUI is unaffected by this change, and keeps polling as before in
BitcoinGUI::detectShutdown()
. It might be possible to listen to a pipe there, too, but I'm not sure, and it's complicated by the GUI-node abstraction.