Only waitpid() for processes that we care about #8737

dimbleby · 2018-07-12T21:42:34Z

It seems that when running in an appimage there are bonus SIGCHLDs flying around. I haven't figured out the exact steps that lead to hangs - but it seems likely that if we're catching SIGCHLD that we don't really need, then someone else is not catching one that they do...

Anyway, fix is tested manually and the neovim appimage doesn't hang any more.

dimbleby · 2018-07-12T21:47:39Z

PS cf https://patchwork.kernel.org/patch/9949491/

oni-link · 2018-07-13T09:05:02Z

src/nvim/os/pty_process_unix.c

    }
+    proc->internal_exit_cb(proc);
+    break;


What if three or more children are terminated? Would the first signal be handled here, the second signal be marked pending and every further (of the same) signal be dropped (we miss a child termination)? Would that be fixed if we remove the break here?

I expect we get a SIGCHLD for each terminated child, and therefore enter this function once for each.

At any rate my intention was to preserve the existing behaviour here - if there is a bug of the sort that you describe, then it ought to be handled separately I think.

Although in both the original code and this PR we only process one job, I can understand oni-link's concern that we may be handling the wrong job because we end up reaping a different child than the one that delivered the signal.

In order to truly be deterministic, though, we would likely need to use sigaction() so we can be directly told which child delivered the signal. Since we're using libuv's abstraction, that's not really an option here.

I wonder if a simpler fix is to change the original code to use waitpid(0, &stat, WNOHANG);. It looks like that should still fix the problem. If that does work, we probably want to actually loop until waitpid() tells us none of our children are waitable.

If I understand correctly the concern is not that we handle jobs in the 'wrong' order - that really shouldn't matter. (A dies, B dies, we handle B, we handle A - so what? B might just as well have died first).

Rather the worry was that we might somehow miss terminations. I don't think that'll happen, though I'm not 100% sure and I can't see either how removing the break would be unsafe. (But I still say, if you want to do that, let's do it in a new issue).

Either way, I'd favour sticking with being completely explicit about what PIDs we're waiting for as the most foolproof approach.

I've done some more googling and I now think that @oni-link is right all along - SIGCHLDs are not queued, so we might see fewer of them than terminated processes.

I'll raise a new issue and propose that the fix is simply to remove the break

justinmk · 2018-07-13T18:59:12Z

@dimbleby This is a huge help, thanks for investigating. Could you please include the full explanation from AppImage/AppImageKit#812 (comment) in the commit message, including the link to https://patchwork.kernel.org/patch/9949491/ ?

It seems as though in an AppImage there's an extra child process that dies at some early point, before we have set up a SIGCHLD handler. So when we later get a SIGCHLD from a child that we do care about, waitpid(-1, ...) tells us about the extra child - and we don't notice that the interesting child has exited. Or something like that! See also: * https://patchwork.kernel.org/patch/9949491/ in which perf hit something similar * discussion at the AppImage repository: AppImage/AppImageKit#812 (comment). Fix is to be explicit about which process we are waitpid()'ing for, so we never need be distracted by children that we don't know about. Fixes #8104

dimbleby · 2018-07-13T20:15:35Z

re-pushed with a bigger comment

FEATURES: 07499a8 #8709 man.vim: C highlighting for EXAMPLES section 07f82ad #8699 TUI: urxvt: also send xterm focus-reporting seqs 40911e4 #8616 API: emit nvim_buf_lines_event from :terminal c46997a #8546 fillchars: Add "eob" flag FIXES: 74d19f6 #8576 startup: avoid blank stdin buffer if other files were opened 4874214 #8737 Only waitpid() for processes that we care about cd6e7e8 #8743 Check all child processes for exit in SIGCHLD handler c230ef2 #8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff 0ed8b12 #8681 transstr_buf: fix length comparison d241f27 #8708 TUI: Fix standout mode 9afed40 #8698 man.vim: fix for mandoc e889640 #8682 provider/node: npm --loglevel silent 1cbc830 #8613 API: nvim_win_set_cursor: set curswant bf6048e #8628 checkhealth: Python: fix VIRTUAL_ENV check 3cc3506 #8528 checkhealth: node.js: also search yarn CHANGES: b751449 #8619 defaults: shortmess+=F 1248178 #8578 highlight: high-priority CursorLine if fg is set. 01570f1 #8726 terminal: handle &confirm and :confirm on unloading 56065bb #8721 screen: truncate showmode messages bf2460e #7551 buffer: fix copying :setlocal options c1c14fa #8520 Ex mode: always "improved" (gQ) 050f397 #7992 options: remove 'maxcombine` option (always 6) INTERNAL: 463da84 #7992 screen: use UTF-8 representation

FEATURES: 07499a8 neovim#8709 man.vim: C highlighting for EXAMPLES section 07f82ad neovim#8699 TUI: urxvt: also send xterm focus-reporting seqs 40911e4 neovim#8616 API: emit nvim_buf_lines_event from :terminal c46997a neovim#8546 fillchars: Add "eob" flag FIXES: 74d19f6 neovim#8576 startup: avoid blank stdin buffer if other files were opened 4874214 neovim#8737 Only waitpid() for processes that we care about cd6e7e8 neovim#8743 Check all child processes for exit in SIGCHLD handler c230ef2 neovim#8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff 0ed8b12 neovim#8681 transstr_buf: fix length comparison d241f27 neovim#8708 TUI: Fix standout mode 9afed40 neovim#8698 man.vim: fix for mandoc e889640 neovim#8682 provider/node: npm --loglevel silent 1cbc830 neovim#8613 API: nvim_win_set_cursor: set curswant bf6048e neovim#8628 checkhealth: Python: fix VIRTUAL_ENV check 3cc3506 neovim#8528 checkhealth: node.js: also search yarn CHANGES: b751449 neovim#8619 defaults: shortmess+=F 1248178 neovim#8578 highlight: high-priority CursorLine if fg is set. 01570f1 neovim#8726 terminal: handle &confirm and :confirm on unloading 56065bb neovim#8721 screen: truncate showmode messages bf2460e neovim#7551 buffer: fix copying :setlocal options c1c14fa neovim#8520 Ex mode: always "improved" (gQ) 050f397 neovim#7992 options: remove 'maxcombine` option (always 6) INTERNAL: 463da84 neovim#7992 screen: use UTF-8 representation

dimbleby mentioned this pull request Jul 12, 2018

Payload software may hang when not intercepting SIGCHLD properly AppImage/AppImageKit#812

Closed

jamessan approved these changes Jul 13, 2018

View reviewed changes

oni-link reviewed Jul 13, 2018

View reviewed changes

dimbleby mentioned this pull request Jul 13, 2018

Possible hangs if multiple child processes die in quick succession #8740

Closed

jamessan merged commit 4874214 into neovim:master Jul 13, 2018

dimbleby deleted the overly-general-waitpid branch July 14, 2018 13:03

This was referenced Jul 17, 2018

AppImage: fzf window will not close if pressing esc #8566

Closed

nvim.appimage weird behavior with fzf.vim #7620

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Only waitpid() for processes that we care about #8737

Only waitpid() for processes that we care about #8737

Uh oh!

dimbleby commented Jul 12, 2018

Uh oh!

dimbleby commented Jul 12, 2018

Uh oh!

oni-link Jul 13, 2018

Uh oh!

dimbleby Jul 13, 2018

Uh oh!

jamessan Jul 13, 2018

Uh oh!

dimbleby Jul 13, 2018

Uh oh!

dimbleby Jul 13, 2018

Uh oh!

justinmk commented Jul 13, 2018

Uh oh!

dimbleby commented Jul 13, 2018

Uh oh!

Uh oh!

Uh oh!

Only waitpid() for processes that we care about #8737

Only waitpid() for processes that we care about #8737

Uh oh!

Conversation

dimbleby commented Jul 12, 2018

Uh oh!

dimbleby commented Jul 12, 2018

Uh oh!

oni-link Jul 13, 2018

Choose a reason for hiding this comment

Uh oh!

dimbleby Jul 13, 2018

Choose a reason for hiding this comment

Uh oh!

jamessan Jul 13, 2018

Choose a reason for hiding this comment

Uh oh!

dimbleby Jul 13, 2018

Choose a reason for hiding this comment

Uh oh!

dimbleby Jul 13, 2018

Choose a reason for hiding this comment

Uh oh!

justinmk commented Jul 13, 2018

Uh oh!

dimbleby commented Jul 13, 2018

Uh oh!

Uh oh!