-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Only waitpid() for processes that we care about #8737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
proc->internal_exit_cb(proc); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if three or more children are terminated? Would the first signal be handled here, the second signal be marked pending and every further (of the same) signal be dropped (we miss a child termination)? Would that be fixed if we remove the break
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect we get a SIGCHLD for each terminated child, and therefore enter this function once for each.
At any rate my intention was to preserve the existing behaviour here - if there is a bug of the sort that you describe, then it ought to be handled separately I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although in both the original code and this PR we only process one job, I can understand oni-link's concern that we may be handling the wrong job because we end up reaping a different child than the one that delivered the signal.
In order to truly be deterministic, though, we would likely need to use sigaction()
so we can be directly told which child delivered the signal. Since we're using libuv's abstraction, that's not really an option here.
I wonder if a simpler fix is to change the original code to use waitpid(0, &stat, WNOHANG);
. It looks like that should still fix the problem. If that does work, we probably want to actually loop until waitpid()
tells us none of our children are waitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly the concern is not that we handle jobs in the 'wrong' order - that really shouldn't matter. (A dies, B dies, we handle B, we handle A - so what? B might just as well have died first).
Rather the worry was that we might somehow miss terminations. I don't think that'll happen, though I'm not 100% sure and I can't see either how removing the break
would be unsafe. (But I still say, if you want to do that, let's do it in a new issue).
Either way, I'd favour sticking with being completely explicit about what PIDs we're waiting for as the most foolproof approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some more googling and I now think that @oni-link is right all along - SIGCHLD
s are not queued, so we might see fewer of them than terminated processes.
I'll raise a new issue and propose that the fix is simply to remove the break
@dimbleby This is a huge help, thanks for investigating. Could you please include the full explanation from AppImage/AppImageKit#812 (comment) in the commit message, including the link to https://patchwork.kernel.org/patch/9949491/ ? |
It seems as though in an AppImage there's an extra child process that dies at some early point, before we have set up a SIGCHLD handler. So when we later get a SIGCHLD from a child that we do care about, waitpid(-1, ...) tells us about the extra child - and we don't notice that the interesting child has exited. Or something like that! See also: * https://patchwork.kernel.org/patch/9949491/ in which perf hit something similar * discussion at the AppImage repository: AppImage/AppImageKit#812 (comment). Fix is to be explicit about which process we are waitpid()'ing for, so we never need be distracted by children that we don't know about. Fixes #8104
re-pushed with a bigger comment |
FEATURES: 07499a8 #8709 man.vim: C highlighting for EXAMPLES section 07f82ad #8699 TUI: urxvt: also send xterm focus-reporting seqs 40911e4 #8616 API: emit nvim_buf_lines_event from :terminal c46997a #8546 fillchars: Add "eob" flag FIXES: 74d19f6 #8576 startup: avoid blank stdin buffer if other files were opened 4874214 #8737 Only waitpid() for processes that we care about cd6e7e8 #8743 Check all child processes for exit in SIGCHLD handler c230ef2 #8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff 0ed8b12 #8681 transstr_buf: fix length comparison d241f27 #8708 TUI: Fix standout mode 9afed40 #8698 man.vim: fix for mandoc e889640 #8682 provider/node: npm --loglevel silent 1cbc830 #8613 API: nvim_win_set_cursor: set curswant bf6048e #8628 checkhealth: Python: fix VIRTUAL_ENV check 3cc3506 #8528 checkhealth: node.js: also search yarn CHANGES: b751449 #8619 defaults: shortmess+=F 1248178 #8578 highlight: high-priority CursorLine if fg is set. 01570f1 #8726 terminal: handle &confirm and :confirm on unloading 56065bb #8721 screen: truncate showmode messages bf2460e #7551 buffer: fix copying :setlocal options c1c14fa #8520 Ex mode: always "improved" (gQ) 050f397 #7992 options: remove 'maxcombine` option (always 6) INTERNAL: 463da84 #7992 screen: use UTF-8 representation
FEATURES: 07499a8 neovim#8709 man.vim: C highlighting for EXAMPLES section 07f82ad neovim#8699 TUI: urxvt: also send xterm focus-reporting seqs 40911e4 neovim#8616 API: emit nvim_buf_lines_event from :terminal c46997a neovim#8546 fillchars: Add "eob" flag FIXES: 74d19f6 neovim#8576 startup: avoid blank stdin buffer if other files were opened 4874214 neovim#8737 Only waitpid() for processes that we care about cd6e7e8 neovim#8743 Check all child processes for exit in SIGCHLD handler c230ef2 neovim#8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff 0ed8b12 neovim#8681 transstr_buf: fix length comparison d241f27 neovim#8708 TUI: Fix standout mode 9afed40 neovim#8698 man.vim: fix for mandoc e889640 neovim#8682 provider/node: npm --loglevel silent 1cbc830 neovim#8613 API: nvim_win_set_cursor: set curswant bf6048e neovim#8628 checkhealth: Python: fix VIRTUAL_ENV check 3cc3506 neovim#8528 checkhealth: node.js: also search yarn CHANGES: b751449 neovim#8619 defaults: shortmess+=F 1248178 neovim#8578 highlight: high-priority CursorLine if fg is set. 01570f1 neovim#8726 terminal: handle &confirm and :confirm on unloading 56065bb neovim#8721 screen: truncate showmode messages bf2460e neovim#7551 buffer: fix copying :setlocal options c1c14fa neovim#8520 Ex mode: always "improved" (gQ) 050f397 neovim#7992 options: remove 'maxcombine` option (always 6) INTERNAL: 463da84 neovim#7992 screen: use UTF-8 representation
Fixes #8104.
It seems that when running in an appimage there are bonus
SIGCHLD
s flying around. I haven't figured out the exact steps that lead to hangs - but it seems likely that if we're catchingSIGCHLD
that we don't really need, then someone else is not catching one that they do...Anyway, fix is tested manually and the neovim appimage doesn't hang any more.