Skip to content

Conversation

gou4shi1
Copy link
Contributor

@maxsharabayko maxsharabayko added this to the Next release milestone Sep 27, 2022
@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels Sep 27, 2022
@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 27, 2022

Hmm, if (offset >= avail_bufsize) and offset - avail_bufsize + 1 are still strange, offset is from drop position, avail_bufsize is from ack position.
That is why I created #2465, offset and avail_bufsize should be calculated from the same base.

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 27, 2022

What about just change this place to

avail_bufsize = m_pRcvBuffer->getAvailSize(m_iRcvLastSkipAck)

but not change other usages of m_pRcvBuffer->getAvailSize()

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 27, 2022

If m_iRcvLastSkipAck is different from m_iRcvLastAck, then m_iStartSeqNo %>= m_iRcvLastSkipAck %>= m_iRcvLastAck, then avail_bufsize should be capacity(), but

        if (CSeqNo::seqcmp(iRBufSeqNo, iFirstUnackSeqNo) >= 0) // iRBufSeqNo >= iFirstUnackSeqNo 
         { 
             // Full capacity is available, still don't want to encourage extra packets to come. 
             // Note: CSeqNo::seqlen(n, n) returns 1. 
             return capacity() - CSeqNo::seqlen(iFirstUnackSeqNo, iRBufSeqNo) + 1; 
         }

Why?

@maxsharabayko
Copy link
Collaborator

The old receiver buffer was not handling packet drops properly. There was a chance to re-add a packet that had been already dropped but managed to arrive later. Such a packet must is considered "belated".

Here the offset is taken from the latest drop position (m_iRcvLastSkipAck %>= m_iRcvLastAck) to determine is this packet is belated and whether it should be submitted to the receiver buffer at all (because the old receiver buffer might actually take it).

// m_iRcvLastSkipAck is the base sequence number for the receiver buffer.
// This is the offset in the buffer; if this is negative, it means that
// this sequence is already in the past and the buffer is not interested.
// Meaning, this packet will be rejected, even if it could potentially be
// one of missing packets in the transmission.
int32_t offset = CSeqNo::seqoff(m_iRcvLastSkipAck, rpkt.m_iSeqNo);

However, the available RCV buffer size must be counted from the first unacknowledged sequence number (m_iRcvLastAck). It is also the size reported back to the sender. The avail_bufsize below is calculated correctly (based on m_iRcvLastAck). However, it is compared to an offset from m_iRcvLastSkipAck.
A different offset value should be used for this comparison. offset(m_iRcvLastSkipAck) <= offset(m_iRcvLastAck), most of the data should land in the receiver buffer. However, in case of a (large) drop, there may be no space in the RCV buffer yet, m_pRcvBuffer->insert(u) would return a negative result, but the outer logic will treat it as a redundant packet.

const int avail_bufsize = (int) getAvailRcvBufferSizeNoLock();
if (offset >= avail_bufsize)
    return -1;
if (m_pRcvBuffer->insert(u) < 0)

So I would say the whole logic of processData(..) is quite messed up and requires review.

@ethouris
Copy link
Collaborator

If possible, try not to change anything in processData yet. I am about to submit a refax PR that tears off some part of this function into a separate function. This is necessary for the work I do now as this exactly fragment would have to get an alternative implementation for the new group code that uses the common receiver buffer. This might also be an opportunity to fix some logics there.

@ethouris
Copy link
Collaborator

Ah, FYI as per your analysis: m_iRcvLastSkipAck was introduced for the live mode early at the first SRT version and it was intended to actually replace m_iRcvLastAck as a sequence marker for the beginning of the buffer. Might be that this replacement wasn't done properly in some places. After that change, m_iRcvLastAck should be used as a marker of the last received ACK message only, while it need not mark the beginning of the receiver buffer. Likely in the original UDT code this field was used for multiple purposes and introduction of m_iRcvLastSkipAck required to separate them.

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 30, 2022

const int avail_bufsize = (int) getAvailRcvBufferSizeNoLock();
if (offset >= avail_bufsize)
    return -1;
if (m_pRcvBuffer->insert(u) < 0)

Maybe just let m_pRcvBuffer to determine whether to accept incoming packets? i.e. just

if (m_pRcvBuffer->insert(u) < 0)

@maxsharabayko
Copy link
Collaborator

const int avail_bufsize = (int) getAvailRcvBufferSizeNoLock();
if (offset >= avail_bufsize)
    return -1;
if (m_pRcvBuffer->insert(u) < 0)

Maybe just let m_pRcvBuffer to determine whether to accept incoming packets? i.e. just

if (m_pRcvBuffer->insert(u) < 0)

Also possible. If insert(..) return -2. then it is a belated packet. If -3, then no room to store.

@maxsharabayko
Copy link
Collaborator

Still, it is incorrect to store a packet if its offset from the last ACK position exceeds the declared buffer size. On the protocol level, the RCV buffer size is tracked from the ACK position. The RCV buffer itself does care about it.

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 30, 2022

Still, it is incorrect to store a packet if its offset from the last ACK position exceeds the declared buffer size. On the protocol level, the RCV buffer size is tracked from the ACK position. The RCV buffer itself does care about it.

Got it. Although I think m_iRcvLastAck will be updated to m_iRcvLastSkipAck in the next timer, it's not a big deal.
But from the readability point of view, I agree more with

m_iRcvLastSkipAck was introduced for the live mode early at the first SRT version and it was intended to actually replace m_iRcvLastAck as a sequence marker for the beginning of the buffer. Might be that this replacement wasn't done properly in some places. After that change, m_iRcvLastAck should be used as a marker of the last received ACK message only, while it need not mark the beginning of the receiver buffer.

And I still don't understand

If m_iRcvLastSkipAck is different from m_iRcvLastAck, then m_iStartSeqNo %>= m_iRcvLastSkipAck %>= m_iRcvLastAck, then avail_bufsize should be capacity(), but

        if (CSeqNo::seqcmp(iRBufSeqNo, iFirstUnackSeqNo) >= 0) // iRBufSeqNo >= iFirstUnackSeqNo 
         { 
             // Full capacity is available, still don't want to encourage extra packets to come. 
             // Note: CSeqNo::seqlen(n, n) returns 1. 
             return capacity() - CSeqNo::seqlen(iFirstUnackSeqNo, iRBufSeqNo) + 1; 
         }

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 30, 2022

A different offset value should be used for this comparison. offset(m_iRcvLastSkipAck) <= offset(m_iRcvLastAck)

What about

if (offset(m_iRcvLastSkipAck) < 0)
  return belated;
if (offset(m_iRcvLastAck) > avail_bufsize)
  return no_room;
```

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 30, 2022

If possible, try not to change anything in processData yet. I am about to submit a refax PR that tears off some part of this function into a separate function. This is necessary for the work I do now as this exactly fragment would have to get an alternative implementation for the new group code that uses the common receiver buffer. This might also be an opportunity to fix some logics there.

I'm excited about this news, very looking forward to the new group mode!

@maxsharabayko
Copy link
Collaborator

@gou4shi1

And I still don't understand

If m_iRcvLastSkipAck is different from m_iRcvLastAck, then m_iStartSeqNo %>= m_iRcvLastSkipAck %>= m_iRcvLastAck, then avail_bufsize should be capacity(), but

        if (CSeqNo::seqcmp(iRBufSeqNo, iFirstUnackSeqNo) >= 0) // iRBufSeqNo >= iFirstUnackSeqNo 
         { 
             // Full capacity is available, still don't want to encourage extra packets to come. 
             // Note: CSeqNo::seqlen(n, n) returns 1. 
             return capacity() - CSeqNo::seqlen(iFirstUnackSeqNo, iRBufSeqNo) + 1; 
         }

The available RCV buffer size is the protocol-defined value. It is transmitted on the wire from RCV to the sender in the "Available Buffer Size" field of the ACK packet. This value must be calculated from the first unacknowledged seqno (m_iRcvLastAck). As it is transmitted on the wire, it is important to have this value consistent for compatibility between previous and future SRT versions.

With a compliant SRT sender and a valid logic of sending the SRT receiver should never receive a data packet outside of the described seqno range.

In the example you cite, the start position of the receiver buffer (1iRBufSeqNo) might be ahead of the first unacknowledged position (iFirstUnackSeqNo`). Still, we have to send the available buffer size as if it was at the position of the first unacknowledged seqno. The same logic applies when we receive a data packet and want to check if its seqno is inside the target range.

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Sep 30, 2022

The available RCV buffer size is the protocol-defined value. It is transmitted on the wire from RCV to the sender in the "Available Buffer Size" field of the ACK packet. This value must be calculated from the first unacknowledged seqno (m_iRcvLastAck). As it is transmitted on the wire, it is important to have this value consistent for compatibility between previous and future SRT versions.

With a compliant SRT sender and a valid logic of sending the SRT receiver should never receive a data packet outside of the described seqno range.

In the example you cite, the start position of the receiver buffer (1iRBufSeqNo) might be ahead of the first unacknowledged position (iFirstUnackSeqNo`). Still, we have to send the available buffer size as if it was at the position of the first unacknowledged seqno. The same logic applies when we receive a data packet and want to check if its seqno is inside the target range.

Got it, thanks for your explanation.

What about

if (offset(m_iRcvLastSkipAck) < 0)
  return belated;
if (offset(m_iRcvLastAck) > avail_bufsize)
  return no_room;

I tried in #2465

@maxsharabayko
Copy link
Collaborator

#2465 is the next in my queue for review 🙂

@gou4shi1 gou4shi1 changed the title [core] Fixed m_iRcvLastSkipAck in error logs. [core] Replaced m_iRcvLastSkipAck with m_iRcvLastAck in logs. Sep 30, 2022
@maxsharabayko maxsharabayko merged commit 1b20a48 into Haivision:master Sep 30, 2022
@gou4shi1 gou4shi1 deleted the fix-logs branch October 1, 2022 00:27
@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Oct 17, 2022

if (CSeqNo::seqcmp(iRBufSeqNo, iFirstUnackSeqNo) >= 0) // iRBufSeqNo >= iFirstUnackSeqNo 
{ 
    // Full capacity is available, still don't want to encourage extra packets to come. 
    // Note: CSeqNo::seqlen(n, n) returns 1. 
    return capacity() - CSeqNo::seqlen(iFirstUnackSeqNo, iRBufSeqNo) + 1; 
}

The available RCV buffer size is the protocol-defined value. It is transmitted on the wire from RCV to the sender in the "Available Buffer Size" field of the ACK packet. This value must be calculated from the first unacknowledged seqno (m_iRcvLastAck). As it is transmitted on the wire, it is important to have this value consistent for compatibility between previous and future SRT versions.

With a compliant SRT sender and a valid logic of sending the SRT receiver should never receive a data packet outside of the described seqno range.

In the example you cite, the start position of the receiver buffer (1iRBufSeqNo) might be ahead of the first unacknowledged position (iFirstUnackSeqNo`). Still, we have to send the available buffer size as if it was at the position of the first unacknowledged seqno. The same logic applies when we receive a data packet and want to check if its seqno is inside the target range.

This code block caused many "No room" issues in my production env (group broadcast mode), e.g.

18:23:10.122034/SRT:TsbPd!W:SRT.br: @918366864:RCV-DROPPED 241 packet(s). Packet seqno %1608225931 delayed for 436.822 ms
18:23:54.539780/SRT:RcvQ:w1!W:SRT.qr: @918366864: No room to store incoming packet seqno 1608233290, insert offset 7272. iFirstUnackSeqNo=1608226018 m_iStartSeqNo=1608233289 m_iStartPos=625 m_iMaxPosInc=1. Space avail 920/8192 pkts. (TSBPD ready in 44ms, timespan 0 ms). STDCXX_STEADY drift 0 ms.
18:23:54.543102/SRT:RcvQ:w1!W:SRT.qr: @918366864: No room to store incoming packet seqno 1608233291, insert offset 7273. iFirstUnackSeqNo=1608226018 m_iStartSeqNo=1608233289 m_iStartPos=625 m_iMaxPosInc=1. Space avail 920/8192 pkts. (TSBPD ready in 41ms, timespan 0 ms). STDCXX_STEADY drift 0 ms.
18:23:54.545223/SRT:RcvQ:w1!W:SRT.qr: @918366864: No room to store incoming packet seqno 1608233292, insert offset 7274. iFirstUnackSeqNo=1608226018 m_iStartSeqNo=1608233289 m_iStartPos=625 m_iMaxPosInc=1. Space avail 920/8192 pkts. (TSBPD ready in 39ms, timespan 0 ms). STDCXX_STEADY drift 0 ms.
18:23:55.218042/SRT:TsbPd!W:SRT.br: @918366864:RCV-DROPPED 14 packet(s). Packet seqno %1608233322 delayed for 383.815 ms

Although it's not a big issue, it would not "No room" forever, but it's still unreasonable to drop these packets:
-> Received 1608233289, the avail_bufsize was 8192, accepted.
-> The only packet in the rcv buffer was 1608233289, both m_iRcvLastAck and m_iRcvLastSkipAck were 1608226018.
-> TSBPD called m_pRcvBuffer->dropUpTo(1608233289) and set m_iRcvLastSkipAck to 1608233289
-> Received 1608233290, the avail_bufsize jumped to 920 because of this code block, dropped.
-> sendCtrlAck() set m_iRcvLastAck to 1608233290, now avail_bufsize became 8191.

Note that this if caused avail_bufsize to jump around unreasonable,

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Oct 17, 2022

The available RCV buffer size is the protocol-defined value. It is transmitted on the wire from RCV to the sender in the "Available Buffer Size" field of the ACK packet. This value must be calculated from the first unacknowledged seqno (m_iRcvLastAck). As it is transmitted on the wire, it is important to have this value consistent for compatibility between previous and future SRT versions.

I think the data[ACKD_BUFFERLEFT] = (int) getAvailRcvBufferSizeNoLock(); in sendCtrlAck() would not suffer from this issue, because it's after ackDataUpTo(). It ensures m_iRcvLastSkipAck == m_iRcvLastAck >= rcv_buf.m_iStartSeqNo, so I think we can safely change that code block to

if (CSeqNo::seqcmp(iRBufSeqNo, iFirstUnackSeqNo) >= 0) // iRBufSeqNo >= iFirstUnackSeqNo 
 { 
     // Full capacity is available.
     return capacity(); 
 }

@maxsharabayko
Copy link
Collaborator

This code block caused many "No room" issues in my production env (group broadcast mode), e.g.

18:23:10.122034/SRT:TsbPd!W:SRT.br: @918366864:RCV-DROPPED 241 packet(s). Packet seqno %1608225931 delayed for 436.822 ms
18:23:54.539780/SRT:RcvQ:w1!W:SRT.qr: @918366864: No room to store incoming packet seqno 1608233290, insert offset 7272. iFirstUnackSeqNo=1608226018 m_iStartSeqNo=1608233289 m_iStartPos=625 m_iMaxPosInc=1. Space avail 920/8192 pkts. (TSBPD ready in 44ms, timespan 0 ms). STDCXX_STEADY drift 0 ms.
18:23:54.543102/SRT:RcvQ:w1!W:SRT.qr: @918366864: No room to store incoming packet seqno 1608233291, insert offset 7273. iFirstUnackSeqNo=1608226018 m_iStartSeqNo=1608233289 m_iStartPos=625 m_iMaxPosInc=1. Space avail 920/8192 pkts. (TSBPD ready in 41ms, timespan 0 ms). STDCXX_STEADY drift 0 ms.
18:23:54.545223/SRT:RcvQ:w1!W:SRT.qr: @918366864: No room to store incoming packet seqno 1608233292, insert offset 7274. iFirstUnackSeqNo=1608226018 m_iStartSeqNo=1608233289 m_iStartPos=625 m_iMaxPosInc=1. Space avail 920/8192 pkts. (TSBPD ready in 39ms, timespan 0 ms). STDCXX_STEADY drift 0 ms.
18:23:55.218042/SRT:TsbPd!W:SRT.br: @918366864:RCV-DROPPED 14 packet(s). Packet seqno %1608233322 delayed for 383.815 ms

Although it's not a big issue, it would not "No room" forever, but it's still unreasonable to drop these packets: -> Received 1608233289, the avail_bufsize was 8192, accepted. -> The only packet in the rcv buffer was 1608233289, both m_iRcvLastAck and m_iRcvLastSkipAck were 1608226018. -> TSBPD called m_pRcvBuffer->dropUpTo(1608233289) and set m_iRcvLastSkipAck to 1608233289 -> Received 1608233290, the avail_bufsize jumped to 920 because of this code block, dropped. -> sendCtrlAck() set m_iRcvLastAck to 1608233290, now avail_bufsize became 8191.

Note that this if caused avail_bufsize to jump around unreasonable,

I assume the root issue is that those packets were not supposed to be sent. Given they reach the receiver, and there is space, it might make sense to store them. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants