Skip to content

Conversation

azat
Copy link

@azat azat commented Jul 10, 2023

Without this patch with signalfd() backend it works incorrectly, this is
what proc_clear_signals() does:

6674  rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
6674  rt_sigaction(SIGCHLD, {sa_handler=SIG_IGN, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffff7c78ab0}, NULL, 8) = 0
6674  close(6<signalfd:[CHLD]>)         = 0

And this is what proc_set_signals() does:

6674  rt_sigaction(SIGCHLD, NULL, {sa_handler=SIG_IGN, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffff7c78ab0}, 8) = 0
6674  rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
6674  signalfd4(-1, [CHLD], 8, SFD_CLOEXEC|SFD_NONBLOCK) = 6<signalfd:[CHLD]>

Pay attention to the SIG_IGN.

And also one more patch for the client, that calls waitpid in a loop to avoid loosing any childs.

P.S. I'm going to disable signalfd backend in libevent by default, since there are some trickery, which will break some existing programs. Also note, that there was not stable release with signalfd enabled.

azat added 2 commits July 10, 2023 10:23
Without this patch with signalfd() backend it works incorrectly, this is
what proc_clear_signals() does:

    6674  rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
    6674  rt_sigaction(SIGCHLD, {sa_handler=SIG_IGN, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffff7c78ab0}, NULL, 8) = 0
    6674  close(6<signalfd:[CHLD]>)         = 0

And this is what proc_set_signals() does:

    6674  rt_sigaction(SIGCHLD, NULL, {sa_handler=SIG_IGN, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffff7c78ab0}, 8) = 0
    6674  rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
    6674  signalfd4(-1, [CHLD], 8, SFD_CLOEXEC|SFD_NONBLOCK) = 6<signalfd:[CHLD]>

Pay attention to the SIG_IGN.
Sometimes, especially in case of signalfd, you may miss subsequent
SIGCHLD signal, but sometimes it is possible for non-signalfd as well
[1].

  [1]: https://ldpreload.com/blog/signalfd-is-useless

So to fix this, let's simply call waitpid() in a loop, this should cause
any troubles.
azat added a commit to azat/libevent that referenced this pull request Jul 10, 2023
signalfd may behave differently to sigaction/signal, so to avoid
breaking libevent users (like [1], [2]) disable it by default.

  [1]: tmux/tmux#3621
  [1]: tmux/tmux#3626

Also signalfd is not that perfect:
- you need to SIG_BLOCK the signal before
  - blocked signals are not reset on exec
  - blocked signals are allowed to coalesce - so in case of multiple
    signals sent you may get the signal only once (ok for most of the
    signals, but may be a problem for SIGCHLD, though you may call
    waitpid() in a loop or use pidfd)
- and also one implementation problem -
  sigprocmask is unspecified in a multithreaded process

Refs:
- https://lwn.net/Articles/415684/
- https://ldpreload.com/blog/signalfd-is-useless

Refs: libevent#1460
Refs: libevent#1342 (cc @dmantipov)
azat added a commit to azat/libevent that referenced this pull request Jul 10, 2023
signalfd may behave differently to sigaction/signal, so to avoid
breaking libevent users (like [1], [2]) disable it by default.

  [1]: tmux/tmux#3621
  [2]: tmux/tmux#3626

Also signalfd is not that perfect:
- you need to SIG_BLOCK the signal before
  - blocked signals are not reset on exec
  - blocked signals are allowed to coalesce - so in case of multiple
    signals sent you may get the signal only once (ok for most of the
    signals, but may be a problem for SIGCHLD, though you may call
    waitpid() in a loop or use pidfd)
- and also one implementation problem -
  sigprocmask is unspecified in a multithreaded process

Refs:
- https://lwn.net/Articles/415684/
- https://ldpreload.com/blog/signalfd-is-useless

Refs: libevent#1460
Refs: libevent#1342 (cc @dmantipov)
@nicm
Copy link
Member

nicm commented Jul 10, 2023

I'm not sure the client can actually have multiple children (maybe it can now), but even if it doesn't it is better to add a loop around waitpid, so I have applied your client_signal change, although you forgot to also exit the loop on ECHILD, so I added that.

I don't understand your other change. Why can libevent not restore the default signal behaviour for SIGCHLD?

@azat
Copy link
Author

azat commented Jul 10, 2023

although you forgot to also exit the loop on ECHILD, so I added that.

Thanks

Why can libevent not restore the default signal behaviour for SIGCHLD?

It does not touches it, tmux does:

$ gg SIG_IGN
client.c:       signal(SIGCHLD, SIG_IGN);

And so the problem here is if server had SIGCHLD=SIG_IGN before calling proc_set_signals, then for execve, when it calls proc_clear_signals, libevent will reset the SIGCHLD to SIG_IGN, since it remember what the signal handler was before and restore on event deletion.

@nicm
Copy link
Member

nicm commented Jul 10, 2023

Hmm, yes...

That was added in 2015 but after that we blocked the signals around daemon (in 2017), so it seems like this should not happen - the client signal handler should be in place before we get to server_start and the server one should be in place before we unblock SIGCHLD at server.c:199, so it seems like one of them should pick up the extra daemon child.

I think we can probably just remove that, like this:

--- client.c    10 Jul 2023 09:35:46 -0000      1.160
+++ client.c    10 Jul 2023 10:18:01 -0000
@@ -246,9 +246,6 @@ client_main(struct event_base *base, int
        u_int                    ncaps = 0;
        struct args_value       *values;

-       /* Ignore SIGCHLD now or daemon() in the server will leave a zombie. */
-       signal(SIGCHLD, SIG_IGN);
-
        /* Set up the initial command. */
        if (shell_command != NULL) {
                msg = MSG_SHELL;

@azat
Copy link
Author

azat commented Jul 10, 2023

Yes, this will be even better (the comment looked odd to me). I've tested this and can confirm that it works.

@nicm
Copy link
Member

nicm commented Jul 10, 2023

Great, I have applied it now. Thanks!

@nicm nicm closed this Jul 10, 2023
azat added a commit to azat/libevent that referenced this pull request Jul 13, 2023
signalfd may behave differently to sigaction/signal, so to avoid
breaking libevent users (like [1], [2]) disable it by default.

  [1]: tmux/tmux#3621
  [2]: tmux/tmux#3626

Also signalfd is not that perfect:
- you need to SIG_BLOCK the signal before
  - blocked signals are not reset on exec
  - blocked signals are allowed to coalesce - so in case of multiple
    signals sent you may get the signal only once (ok for most of the
    signals, but may be a problem for SIGCHLD, though you may call
    waitpid() in a loop or use pidfd)
- and also one implementation problem -
  sigprocmask is unspecified in a multithreaded process

Refs:
- https://lwn.net/Articles/415684/
- https://ldpreload.com/blog/signalfd-is-useless

Refs: libevent#1460
Refs: libevent#1342 (cc @dmantipov)
@github-actions
Copy link

github-actions bot commented Sep 9, 2023

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants