Skip to content

System calls interrupted by profilers (such as poll) cause EINTR errors that break libssh2 #955

@ivoanjo

Description

@ivoanjo

Describe the bug

Hey 👋! Thanks for libssh2 🎉

I work on Datadog's Ruby profiler and I arrived here while investigating a customer issue where they were using the rugged library which uses libgit2 which uses libssh2, and it broke in combination with our profiler.

Specifically, what happened is that our profiler uses unix signals (specifically SIGPROF) to interrupt the application and then gather information on what it was doing.

This has a downside: some system calls get interrupted by the signal handler (as documented on the man page) and return EINTR to tell the caller this has happened and that it should try again.

Unfortunately, libssh2 does not specifically handle EINTR errors -- it treats them as a real error, and fails connections when it happens, returning an error to the caller.

Note that this approach of using signals is not unique to our profiler:

(Funnily enough, there's an old essay on unix issues written in 1991 that complains exactly about this -- link)

To Reproduce
I have modified ssh2.c to simulate having a profiler running to trigger the issue:

$ git diff
diff --git a/example/ssh2.c b/example/ssh2.c
index a762d70..a96af2e 100644
--- a/example/ssh2.c
+++ b/example/ssh2.c
@@ -35,6 +35,27 @@
 #include <stdlib.h>
 #include <ctype.h>
 
+#include <pthread.h>
+#include <signal.h>
+#include <errno.h>
+
+pthread_t main_thread;
+
+static void *background_thread(void *unused) {
+    fprintf(stderr, "Hello from background thread!\n");
+
+    while (1) {
+        pthread_kill(main_thread, SIGPROF);
+        usleep(1000);
+    }
+
+    return NULL;
+}
+
+static void signal_handler_function(int foo, siginfo_t *bar, void *baz) {
+    //fprintf(stderr, "Hello from signal handler!\n");
+}
+
 #if defined(_MSC_VER) && _MSC_VER < 1900
 #define snprintf _snprintf
 #endif
@@ -73,6 +94,22 @@ int main(int argc, char *argv[])
     char *userauthlist;
     LIBSSH2_SESSION *session;
     LIBSSH2_CHANNEL *channel;
+    int error_code;
+    pthread_t thread;
+
+    struct sigaction signal_handler_config = {
+      .sa_flags = SA_RESTART | SA_SIGINFO,
+      .sa_sigaction = signal_handler_function
+    };
+    sigemptyset(&signal_handler_config.sa_mask);
+
+    if (sigaction(SIGPROF, &signal_handler_config, NULL) != 0) {
+        fprintf(stderr, "Could not install signal handler\n");
+    }
+
+    main_thread = pthread_self();
+
+    pthread_create(&thread, NULL, background_thread, NULL);
 
 #ifdef WIN32
     WSADATA wsadata;
@@ -137,8 +174,8 @@ int main(int argc, char *argv[])
         LIBSSH2_TRACE_SOCKET
     );
 
-    if(libssh2_session_handshake(session, sock)) {
-        fprintf(stderr, "Failure establishing SSH session\n");
+    if(error_code = libssh2_session_handshake(session, sock)) {
+        fprintf(stderr, "Failure establishing SSH session (error code %d)\n", error_code);
         return -1;
     }
 
diff --git a/src/session.c b/src/session.c
index 5d2b023..8a8fc8d 100644
--- a/src/session.c
+++ b/src/session.c
@@ -710,6 +710,7 @@ int _libssh2_wait_socket(LIBSSH2_SESSION *session, time_t start_time)
                               "Timed out waiting on socket");
     }
     if(rc < 0) {
+        perror("Poll failed");
         return _libssh2_error(session, LIBSSH2_ERROR_TIMEOUT,
                               "Error waiting on socket");
     }

And here's what I see:

$ example/example-ssh2 192.168.0.3 user password sleep 10
Hello from background thread!
Connecting to 192.168.0.3:22 as user user
Poll failed: Interrupted system call
Failure establishing SSH session (error code -9)

and here's the relevant excerpt from running strace as well:

sendto(3, "SSH-2.0-libssh2_1.10.1_DEV\r\n", 28, MSG_NOSIGNAL, NULL, 0) = 28
recvfrom(3, 0x7ffe3f6ec627, 1, MSG_NOSIGNAL, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, -1)    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGPROF {si_signo=SIGPROF, si_code=SI_TKILL, si_pid=358496, si_uid=1000} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
write(2, "Poll failed: Interrupted system "..., 37Poll failed: Interrupted system call
) = 37
write(2, "Failure establishing SSH session"..., 49Failure establishing SSH session (error code -9)
) = 49
exit_group(-1)                          = ?
+++ exited with 255 +++

Expected behavior

When poll and similar system calls are interrupted with an EINTR error, libssh2 should retry the call, instead of considering it as a real error.

Version (please complete the following information):

  • OS and version: Linux, Ubuntu 22.04.2 LTS
  • libssh2 version: master
  • crypto backend: openssl

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions