Skip to content

Conversation

justinmk
Copy link
Member

@justinmk justinmk commented Mar 6, 2018

closes #6530

  • are PTY jobs being killed correctly?
  • is process_teardown() hang #6891 related?
  • win: iterate through children (os_proc_tree_kill())
  • implement new API function: nvim_get_proc_children()
  • implement new API function: nvim_get_proc()
  • tests

Didn't fix this common test failure (maybe #7376 ?):

TermClose event reports the correct <abuf>: -- Output to stderr:
Vim: Caught deadly signal 'SIGHUP'
Vim: Finished.
CMake Error at /home/travis/build/neovim/neovim/cmake/RunTests.cmake:53 (message):
  Running functional tests failed with error: 1.
...
===============================================================================
NVIM_LOG_FILE: /home/travis/build/neovim/neovim/build/.nvimlog
2018/03/06 08:57:28 ERROR 11479 loop_close:133: uv_loop_close() hang?
[--I] signal   0xa310e8
[-AI] async    0xa30f30
[R--] signal   0xa2a620

quickbuild failure is unrelated, spell_spec.lua #8102

@mhinz
Copy link
Member

mhinz commented Mar 6, 2018

:let id = jobstart('sleep 30 | sleep 30 | sleep 30')
:call jobstop(id)

..finally works as expected after this patch. 👍

{
// New session and progress-group. #6530
setsid();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a new session instead of a new process group? Would setpgid(0,0); also do the job?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a new session customary in a terminal emulator?

ILOG("Sending %s to pid %d", sig == SIGTERM ? "SIGTERM" : "SIGKILL",
proc->pid);
uv_kill(proc->pid, sig);
ILOG("sending %s to pid %d", sig == SIGTERM ? "SIGTERM" : "SIGKILL", pid);
Copy link
Contributor

@oni-link oni-link Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pid -> pgid and -pid?

@janlazo
Copy link
Contributor

janlazo commented Mar 13, 2018

Will this fix the invalid channel id or the exit code for job tests for cat.exe on Windows?
I'm using jobclose to get exit code 0 because jobstop can cause exit code 1.

@justinmk
Copy link
Member Author

Will this fix the invalid channel id or the exit code for job tests for cat.exe on Windows?

I wouldn't expect this to fix "invalid channel id". And, I don't think it will change behavior of cat.exe unless cat.exe was running in a shell.

I'm using jobclose to get exit code 0 because jobstop can cause exit code 1.

We should probably keep that test "pending" then :)

bool exists = false;
size_t p_count = len / sizeof(*p_list);
for (size_t i = 0; i < p_count; i++) {
exists |= (p_list[i].kp_proc.p_pid == ppid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

exists ||= (p_list[i].ki_pid == ppid);
if (p_list[i].ki_ppid == ppid) {
  temp = xrealloc(temp, (*proc_count + 1) * sizeof(*temp));
  temp[*proc_count] = p_list[i].ki_pid;

snprintf(proc_p, sizeof(proc_p), "/proc/%d/task/%d/children", ppid, ppid);
FILE *fp = fopen(proc_p, "r");
if (fp == NULL) {
return 1; // Process not found.
Copy link
Contributor

@oni-link oni-link Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing the children entry under proc, because of an unset kernel option. Could one fallback to pgrep -P ppid or ps --ppid ppid in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oni-link How does ps get this info without the kernel providing an API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps iterating through the entries (processes) under /proc and evaluating each entry status for ppid?

Copy link
Member Author

@justinmk justinmk Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oni-link Added a fallback for pgrep since it seems to be available on linux/macOS/BSD.

UV_PROCESS_DETACHED compels libuv:uv__process_child_init() to call
setsid() in the child just after fork().  That ensures the process and
its descendants are grouped in a separate session (and process group).

The following jobstart() call correctly groups `sh` and `sleep` in a new
session (and process-group), where `sh` is the "session leader" (and
process-group leader):

    :call jobstart(['sh','-c','sleep 60'])

     SESN  PGRP   PID  PPID  Command
    30383 30383 30383  3620  │  ├─ -bash
    30383 31432 31432 30383  │  │  └─ nvim -u NORC
    30383 31432 31433 30383  │  │     ├─ nvim -u NORC
     8105  8105  8105 31432  │  │     └─ sh -c sleep 60
     8105  8105  8106  8105  │  │        └─ sleep 60

closes neovim#6530
ref: https://stackoverflow.com/q/1046933
ref: https://unix.stackexchange.com/a/404065

Helped-by: Marco Hinz <mh.codebro+github@gmail.com>

Discussion
------------------------------------------------------------------------

On my linux box before this patch, the termclose_spec.lua:'kills job
trapping SIGTERM' test indirectly causes cmake/busted to wait for 60s.
That's because the test spawns a `sleep 60` descendant process which
hangs around even after nvim exits: nvim killed the parent PID, but not
PGID (process-group), so the grandchild "reparented" to init (PID 1).

Session contains processes (and process-groups) which are logically part
of the same "login session". Process-group is a set of
logically/informally-related processes within a session; for example,
shells assign a process group to each "job". Session IDs and PGIDs both
have type pid_t (like PIDs).

These OS-level mechanisms are, as usual, legacy accidents whose purpose
is upheld by convention and folklore.  We can use session-level grouping
(setsid), or we could use process-group-level grouping (setpgid).

Vim uses setsid() if available, otherwise setpgid(0,0).

Windows
------------------------------------------------------------------------

UV_PROCESS_DETACHED on win32 sets CREATE_NEW_PROCESS_GROUP flag.
But uv_kill() does not kill the process-group:
nodejs/node#3617

Ideas:
- Set UV_PROCESS_DETACHED (CREATE_NEW_PROCESS_GROUP), then call
  GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT, pid)
   - Maybe won't work because MSDN says "Only processes that share the
     same console as the calling process receive the signal."
     https://docs.microsoft.com/en-us/windows/console/generateconsolectrlevent
     But CREATE_NEW_PROCESS_GROUP creates a new console ...
     ref https://stackoverflow.com/q/1453520
- Group processes within a "job". libuv does that *globally* for
  non-detached processes: uv__init_global_job_handle.
- Iterate through CreateToolhelp32Snapshot().
   - https://stackoverflow.com/q/1173342
   - Vim does this, see terminate_all()
XXX: comment at https://stackoverflow.com/q/1173342 :
> Windows recycles PIDs quite fast, you have to be extra careful not
> to kill unrelated processes. These APIs will report PPIDs for long
> dead processes whose PIDs may have been recycled. Check the parent
> start date to make sure it is related to the processes you spawned.
@janlazo
Copy link
Contributor

janlazo commented Mar 17, 2018

Which ps is os_proc_info using on Windows?
I'm using tasklist to check for pid in ca7d284

@justinmk
Copy link
Member Author

justinmk commented Mar 17, 2018

Which ps is os_proc_info using on Windows?

On WIN32 it uses the WIN32 API. No ps, no tasklist.

@justinmk justinmk force-pushed the job-setsid branch 2 times, most recently from 489f0f1 to 4c690e5 Compare March 17, 2018 23:10
TODO: "exepath" field (win32: QueryFullProcessImageName())

On unix-likes `ps` is used because the platform-specific APIs are
a nightmare.  For reference, below is a (incomplete) attempt:

diff --git a/src/nvim/os/process.c b/src/nvim/os/process.c
index 0976992..99afbbf290c1 100644
--- a/src/nvim/os/process.c
+++ b/src/nvim/os/process.c
@@ -208,3 +210,60 @@ int os_proc_children(int ppid, int **proc_list, size_t *proc_count)
   return 0;
 }

+/// Gets various properties of the process identified by `pid`.
+///
+/// @param pid Process to inspect.
+/// @return Map of process properties, empty on error.
+Dictionary os_proc_info(int pid)
+{
+  Dictionary pinfo = ARRAY_DICT_INIT;
+#ifdef WIN32
+
+#elif defined(__APPLE__)
+  char buf[PROC_PIDPATHINFO_MAXSIZE];
+  if (proc_pidpath(pid, buf, sizeof(buf))) {
+    name = getName(buf);
+    PUT(pinfo, "exepath", STRING_OBJ(cstr_to_string(buf)));
+    return name;
+  } else {
+    ILOG("proc_pidpath() failed for pid: %d", pid);
+  }
+#elif defined(BSD)
+# if defined(__FreeBSD__)
+#  define KP_COMM(o) o.ki_comm
+# else
+#  define KP_COMM(o) o.p_comm
+# endif
+  struct kinfo_proc *proc = kinfo_getproc(pid);
+  if (proc) {
+    PUT(pinfo, "name", cstr_to_string(KP_COMM(proc)));
+    xfree(proc);
+  } else {
+    ILOG("kinfo_getproc() failed for pid: %d", pid);
+  }
+
+#elif defined(__linux__)
+  char fname[256] = { 0 };
+  char buf[MAXPATHL];
+  snprintf(fname, sizeof(fname), "/proc/%d/comm", pid);
+  FILE *fp = fopen(fname, "r");
+  // FileDescriptor *f = file_open_new(&error, fname, kFileReadOnly, 0);
+  // ptrdiff_t file_read(FileDescriptor *const fp, char *const ret_buf,
+  //                     const size_t size)
+  if (fp == NULL) {
+    ILOG("fopen() of /proc/%d/comm failed", pid);
+  } else {
+    size_t n = fread(buf, sizeof(char), sizeof(buf) - 1, fp);
+    if (n == 0) {
+      WLOG("fread() of /proc/%d/comm failed", pid);
+    } else {
+      size_t end = MIN(sizeof(buf) - 1, n);
+      end = (end > 0 && buf[end - 1] == '\n') ? end - 1 : end;
+      buf[end] = '\0';
+      PUT(pinfo, "name", STRING_OBJ(cstr_to_string(buf)));
+    }
+  }
+  fclose(fp);
+#endif
+  return pinfo;
+}
@justinmk justinmk added the api libnvim, Nvim RPC API label Mar 17, 2018
@justinmk justinmk merged commit 4e02f1a into neovim:master Mar 18, 2018
@justinmk justinmk deleted the job-setsid branch March 18, 2018 17:36
@janlazo
Copy link
Contributor

janlazo commented Mar 18, 2018

@justinmk ping -n 1 -w 30000 exits within a second.

@justinmk
Copy link
Member Author

@janlazo Gah, I changed that at the last minute to match

feed_command([[terminal for /L \\%I in (1,0,2) do ( echo foo & ping -w 100 -n 1 127.0.0.1 > nul )]])

I will fix after #8120 , when ae409b5 can be reverted.

@janlazo
Copy link
Contributor

janlazo commented Mar 18, 2018

My bad on that commit. I used ping for <= 1 sec. timeout but any program that exits quickly would have sufficed because of how for loops work on cmd.exe. Powershell has a long startup (2-3 sec. to run exit 4 for jobwait() test) so I couldn't use Start-Sleep -Milliseconds 100.

justinmk added a commit that referenced this pull request Jun 11, 2018
FEATURES:
3cc7ebf #7234 built-in VimL expression parser
6a7c904 #4419 implement <Cmd> key to invoke command in any mode
b836328 #7679 'startup: treat stdin as text instead of commands'
58b210e :digraphs : highlight with hl-SpecialKey #2690
7a13611 #8276 'startup: Let `-s -` read from stdin'
1e71978 events: VimSuspend, VimResume #8280
1e7d5e8 #6272 'stdpath()'
f96d99a #8247 server: introduce --listen
e8c39f7 #8226 insert-mode: interpret unmapped META as ESC
98e7112 msg: do not scroll entire screen (#8088)
f72630b #8055 let negative 'writedelay' show all redraws
5d2dd2e win: has("wsl") on Windows Subsystem for Linux #7330
a4f6cec cmdline: CmdlineEnter and CmdlineLeave autocommands (#7422)
207b7ca #6844 channels: support buffered output and bytes sockets/stdio

API:
f85cbea #7917 API: buffer updates
418abfc #6743 API: list information about all channels/jobs.
36b2e3f #8375 API: nvim_get_commands
273d2cd #8329 API: Make nvim_set_option() update `:verbose set …`
8d40b36 #8371 API: more reliable/descriptive VimL errors
ebb1acb #8353 API: nvim_call_dict_function
9f994bb #8004 API: nvim_list_uis
3405704 #7520 API/UI: forward option updates to UIs
911b1e4 #7821 API: improve nvim_command_output

WINDOWS OS:
9cefd83 #8084, #8516 build/win: support MSVC
ee4e1fd win: Fix reading content from stdin (#8267)

TUI:
ffb8904 #8309 TUI: add support for mouse release events in urxvt
8d5a46e #8081 TUI: implement "standout" attribute
6071637 TUI: support TERM=konsole-256color
67848c0 #7653 TUI: report TUI info with -V3 ('verbose' >= 3)
3d0ee17 TUI/rxvt: enable focus-reporting
d109f56 #7640 TUI: 'term' option: reflect effective terminal behavior

FIXES:
ed6a113 #8273 'job-control: avoid kill-timer race'
4e02f1a #8107 'jobs: separate process-group'
451c48a terminal: flush vterm output buffer on pty output #8486
5d6732f :checkhealth fixes #8335
53f11dc #8218 'Fix errors reported by PVS'
d05712f inccommand: pause :terminal redraws (#8307)
51af911 inccommand: do not execute trailing commands #8256
84359a4 terminal: resize to the max dimensions (#8249)
d49c1dd #8228 Make vim_fgets() return the same values as in Vim
60e96a4 screen: winhl=Normal:Background should not override syntax (#8093)
0c59ac1 #5908 'shada: Also save numbered marks'
ba87a2c cscope: ignore EINTR while reading the prompt (#8079)
b1412dc #7971 ':terminal Enter/Leave should not increment jumplist'
3a5721e TUI: libtermkey: force CSI driver for mouse input #7948
6ff13d7 #7720 TUI: faster startup
1c6e956 #7862 TUI: fix resize-related segfaults
a58c909 #7676 TUI: always hide cursor when flushing, never flush buffers during unibilium output
303e1df #7624 TUI: disable BCE almost always
249bdb0 #7761 mark: Make sure that jumplist item will not have zero lnum
6f41ce0 #7704 macOS: Set $LANG based on the system locale
a043899 #7633 'Retry fgets on EINTR'

CHANGES:
ad60927 #8304 default to 'nofsync'
f3f1970 #8035 defaults: 'fillchars'
a6052c7 #7984 defaults: sidescroll=1
b69fa86 #7888 defaults: enable cscopeverbose
7c4bb23 defaults: do :filetype stuff unless explicitly "off"
2aa308c #5658 'Apply :lmap in macros'
8ce6393 terminal: Leave 'relativenumber' alone (#8360)
e46534b #4486 refactor: Remove maxmem, maxmemtot options
131aad9 win: defaults: 'shellcmdflag', 'shellxquote' #7343
c57d315 #8031 jobwait(): return -2 on interrupt also with timeout
6452831 clipboard: macOS: fallback to tmux if pbcopy is broken #7940
300d365 #7919 Make 'langnoremap' apply directly after a map
ada1956 #7880 'lua/executor: Remove lightuserdata'

INTERNAL:
de0a954 #7806 internal statistics for list impl
dee78a4 #7708 rewrite internal list impl
justinmk added a commit to justinmk/neovim that referenced this pull request Jul 2, 2018
jobstart() does NOT run in the same process-group (neovim#8107):
    :call jobstart('ps', {'on_stdout':{j,d,e->append(0,d)}})
justinmk added a commit to justinmk/neovim that referenced this pull request Jul 3, 2018
closes neovim#8217
closes neovim#8450

system() and :! are expected to run the in same process-group as Nvim.

NB: jobstart() does NOT run in the same process-group (neovim#8107):
    :call jobstart('ps', {'on_stdout':{j,d,e->append(0,d)}})

Background:
8d90171 changed ALL child-spawn utilities to do setsid().

Q: If we don't create a new session/process-group for :! and system(),
   how to avoid zombie descendants (e.g. process_wait() calls
   process_stop(), which only kills the root process)?
A: Send signal to process-group, but ignore the signal in our own
   process (signal_reject_deadly()). Vim does something similar:
   https://github.com/vim/vim/blob/e7499ddc33508d3d341e96f84a0e7b95b2d6927c/src/os_unix.c#L4834-L4841
   https://github.com/vim/vim/blob/e7499ddc33508d3d341e96f84a0e7b95b2d6927c/src/os_unix.c#L5122-L5134

Vim does setsid() in some cases of mch_call_shell_fork() (analogous to
Nvim's os_system()), but check the logic carefully--it's only for some
(irrelevant) GUI scenarios.
justinmk added a commit to justinmk/neovim that referenced this pull request Jul 3, 2018
closes neovim#8217
closes neovim#8450

system() and :! are expected to run the in same process-group as Nvim.

NB: jobstart() does NOT run in the same process-group (neovim#8107):
    :call jobstart('ps', {'on_stdout':{j,d,e->append(0,d)}})

Background:
8d90171 changed ALL child-spawn utilities to do setsid().

Q: If we don't create a new session/process-group for :! and system(),
   how to avoid zombie descendants (e.g. process_wait() calls
   process_stop(), which only kills the root process)?
A: Send signal to process-group, but ignore the signal in our own
   process (signal_reject_deadly()). Vim does something similar:
   https://github.com/vim/vim/blob/e7499ddc33508d3d341e96f84a0e7b95b2d6927c/src/os_unix.c#L4834-L4841
   https://github.com/vim/vim/blob/e7499ddc33508d3d341e96f84a0e7b95b2d6927c/src/os_unix.c#L5122-L5134

Vim does setsid() in some cases of mch_call_shell_fork() (analogous to
Nvim's os_system()), but check the logic carefully--it's only for some
(irrelevant) GUI scenarios.
justinmk added a commit to justinmk/neovim that referenced this pull request Jul 4, 2018
(This commit is for reference; the functional change will be reverted.)

ref neovim#8217
ref neovim#8450
ref neovim#8678

In terminal-Vim, system() and :! run in Vim's process-group. But
8d90171 changed all of Nvim's process-spawn utilities to do
setsid(), which conflicts with that expected terminal-Vim behavior.

To "fix" that, this commit defines Process.detach as a TriState, then
handles the kNone case such that system() and :! do not do setsid() in
the spawned child.

But this commit REGRESSES 8d90171 (neovim#8107), so for example the
following code causes orphan processes:
    :echo system('sleep 30|sleep 30|sleep 30')

Q: If we don't create a new session/process-group, how to avoid zombie
   descendants (e.g. process_wait() calls process_stop(), which only
   kills the root process)?
A: Vim's approach in mch_call_shell_fork() is:
   1. BLOCK_SIGNALS (ignores deadly)
   2. fork()
   3. unblock signals in the child
   4. On CTRL-C, send SIGINT to the process-group id: kill(-pid, SIGINT)
   5. Parent (vim) ignores the signal. Child (and descendants) do not.
   https://github.com/vim/vim/blob/e7499ddc33508d3d341e96f84a0e7b95b2d6927c/src/os_unix.c#L4834-L4841
   https://github.com/vim/vim/blob/e7499ddc33508d3d341e96f84a0e7b95b2d6927c/src/os_unix.c#L5122-L5134

But we can't do that if we want to use the existing (libuv-based) form
of process_spawn().
@justinmk justinmk mentioned this pull request Dec 12, 2018
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api libnvim, Nvim RPC API job-control OS processes, spawn
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jobstop does not kill process children/descendants
7 participants