Skip to content

Conversation

cl-ment
Copy link
Collaborator

@cl-ment cl-ment commented Jan 24, 2025

... during no active input (FIX #2997)

There are two primary reasons for this issue:

  1. The main loop assumes that each time srt_epoll_wait returns, there is data available to read from the source. However, this assumption is incorrect when an SRT socket is in listener mode and on the "target" side. In this mode, srt_epoll_wait can return simply to provide an opportunity to accept a new connection, even if no data is ready to be read. As a result, the process becomes stuck while waiting to read from the source.
  2. The ConsoleSource operates in blocking mode by default. Even worse, its Read() method blocks execution until a full packet (1456 bytes) is received.

@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [apps] Area: Test applications related improvements labels Jan 24, 2025
@maxsharabayko maxsharabayko added this to the v1.5.5 milestone Jan 24, 2025
cl-ment and others added 5 commits January 24, 2025 15:31
Co-authored-by: Maxim Sharabayko <maxlovic@gmail.com>
Co-authored-by: Maxim Sharabayko <maxlovic@gmail.com>
Co-authored-by: Maxim Sharabayko <maxlovic@gmail.com>
@@ -712,6 +713,9 @@ class ConsoleSource: public Source
// The default stdin mode on windows is text.
// We have to set it to the binary mode
_setmode(_fileno(stdin), _O_BINARY);
#else
const int fd = fileno(stdin);
may_block = fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK) < 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't make any sense. If this fails (best from the constructor), it should make the application exit. Error from this call is unlikely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, when implementing an event loop for reading, the pattern is to use non-blocking file descriptors and continue reading until errno == EAGAIN is encountered.
If it’s not possible to make the file descriptors non-blocking, an alternative approach is to make a single read() call for each “read” event. While this approach is not optimal due to the increased number of system calls, it still ensures the process functions as expected from the user’s perspective. I prefer this pragmatic approach over halting the process just because it cannot run in an optimal manner.

@@ -800,9 +843,10 @@ int main(int argc, char** argv)

dataqueue.push_back(pkt);
receivedBytes += pkt->payload.size();
if (src->MayBlock())
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that actually reading should be done this way: either you have the device read-ready, so you read, and then after you read, you don't know, and have to recheck.

Or, you can resolve to reading multiple times, counting on that when particular time reading isn't ready, then the Read call should report an error. Might be, I think, a good idea, to keep the "blocked" state in the fields, which will be written to, in case when particular Read implementation finds out that the call failed due to not being ready. This way it won't need to see if this is SRT and this way we use that function to get the error and maybe check for an SRT-specific readiness failure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the current implementation already follows this approach. For file descriptors in blocking mode (case 1), it checks if the device is ready before attempting to read, performs the read operation, and rechecks the state afterward since readiness isn’t guaranteed. For file descriptors in non-blocking mode (case 2), it handles multiple read attempts and relies on the error returned by the Read call to detect if the device isn’t ready.
Could you clarify if there’s something specific you’d like me to adjust or add to the current implementation? Perhaps there’s a particular scenario or behavior you’d like to address that isn’t currently handled?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here is unclear. First, MayBlock() actually represent the state whether setting the nonblocking mode could be done, which is useless. If the architecture requires nonblocking mode, it should be nonblocking always, and all devices should be operated as such. The case if operating the console device in blocking mode should not be even taken into account.

The only thing I'm referring to is the approach to multiple reading calls, which should follow one of two methods:

  • Check if read-ready always before calling Read(), and break the loop if it isn't
  • Check if read-ready once before the loop, then call Read() in loop until Read() informs you that reading is no longer possible

Note that the second approach isn't possible to be used reliably in case of blocking mode, that's why it should not be taken into account.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the benefits of enforcing non-blocking mode for consistency and simplifying the architecture. However, I think it might be preferable to maintain support for both blocking and non-blocking modes. This flexibility ensures compatibility with a broader range of sources and use cases, particularly for systems where blocking mode is either required or more practical.

The current implementation can differentiate the handling of each mode:
• For blocking mode: Always check read-readiness before calling Read().
• For non-blocking mode: Check read-readiness once before the loop and perform multiple Read() calls until it’s no longer possible.

Would you be open to maintaining support for both modes, or do you see specific challenges in doing so?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial specific about srt-live-transmit is that it is intended as a sample application, showing how to work with SRT using epoll and non-blocking mode. I don't think supporting the blocking mode was the case.
Now the problem is that a user can actually set the blocking mode via URI srt://ip:port?blocking=true, and then the application must either work, or report an error that the blocking mode is not supported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. This is built in into the architecture of this application to only support the nonblocking mode and it should not even take the blocking mode into account - unlike the srt-test-live application, which supports both, and it is prepared to work in either mode.

@@ -800,9 +843,10 @@ int main(int argc, char** argv)

dataqueue.push_back(pkt);
receivedBytes += pkt->payload.size();
if (src->MayBlock())
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial specific about srt-live-transmit is that it is intended as a sample application, showing how to work with SRT using epoll and non-blocking mode. I don't think supporting the blocking mode was the case.
Now the problem is that a user can actually set the blocking mode via URI srt://ip:port?blocking=true, and then the application must either work, or report an error that the blocking mode is not supported.


bool srcReady = false;

if (src.get() && src->IsOpen())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this way then if you still want to use cin.eof()?

Suggested change
if (src.get() && src->IsOpen())
if (src.get() && src->IsOpen() && !src->EOF())

@maxsharabayko
Copy link
Collaborator

The review has dragged on. If there are no critical comments, I suggest merging once CI checks pass.
MayBlock is still the remaining point, as I understand. Not crucial though.
cin.eof() usage still can be done, see my suggestion, but not sure if it would work properly for other mediums. Not critical, I assume?

@maxsharabayko
Copy link
Collaborator

Not related to these changes, but noting down the QEMU CI failure on test

[ RUN      ] ReuseAddr.DiffAddr
[T/S] serverSocket: creating binder socket
[T/S] Bind @275074266 to: 127.0.0.1:5000 (IPv4)
[T/S] ... result 0 (expected to succeed)
[T/S] serverSocket: creating listener socket
[T/S] Bind @275074265 to: 172.17.0.2:5000 (IPv4)
[T/S] ... result 0 (expected to succeed)
[T/S] Listener/binder sock @275074265 added to server_pollid
[T/C] Setting up client socket
[T/S] Wait 10s on E81 for acceptance on @275074265 ...
[T/C] Connecting to: 172.17.0.2:5000 (IPv4)
[T/S] Accepted from: 172.17.0.2:42683
[T/S] Wait for data reception...
[T/C] Waiting for connection readiness...
[T/C] Client exit
[T/S] closing client socket
[T/S] closing sockets: ACP:@275074263...
/github/workspace/test/test_main.cpp:207: Failure
Expected: (close_error) != (SRT_EINVSOCK), actual: 5004 vs 5004
[T/S]accept CREATED: /github/workspace/test/test_reuseaddr.cpp:347
[T/S] joining client async 
[T/S] waiting for cleanup of @275074266 up to 10s
[T/S] @275074266 dissolved after 1.002s
[T/S] waiting for cleanup of @275074265 up to 10s
[T/S] @275074265 dissolved after 1.001s
[  FAILED  ] ReuseAddr.DiffAddr (2133 ms)

@cl-ment cl-ment merged commit ce0a888 into Haivision:master Feb 13, 2025
12 checks passed
@Frenzie
Copy link

Frenzie commented Feb 14, 2025

Thanks! I can confirm it now successfully fully destroys the connection and rebinds after the receiver stops.

Media path: 'file://con' --> 'srt://127.0.0.1:5000?mode=listener'
SRT parameters specified:

        mode = 'listener'
Opening SRT target listener on 127.0.0.1:5000
Binding a server on 127.0.0.1:5000 ...
 listen...
 accept... 
 connected.
Accepted SRT target connection
SRT target disconnected
SrtCommon: DESTROYING CONNECTION, closing sockets (rt%646118174 ls%-1)...
SrtCommon: ... done.
SRT parameters specified:

        mode = 'listener'
Opening SRT target listener on 127.0.0.1:5000
Binding a server on 127.0.0.1:5000 ...
 listen...
 accept... 
 connected.
11:07:09.860234/srt-live-transm*E:SRT.ea: remove_usock: @646118173 not found as either socket or group. Removing only from epoll system.
Accepted SRT target connection
11:07:10.860323/SRT:GC*E:SRT.ei: epoll/update: IPE: update struck E1 which is NOT SUBSCRIBED to @646118173
11:07:10.860392/SRT:GC*E:SRT.ei: epoll/update: IPE: update struck E1 which is NOT SUBSCRIBED to @646118173

@cl-ment cl-ment deleted the issue-2997 branch June 11, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[apps] Area: Test applications related improvements Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] srt-live-transmit stops listening after SRT disconnect during no active input
4 participants