Skip to content

Conversation

camshaft
Copy link
Contributor

@camshaft camshaft commented Apr 17, 2025

Description of changes:

This change fixes a bug in the stream send state manager where it gets into a PTO feedback loop by constantly sending PTOs. This is reproduced by the test I added that shows that the sender constantly sends packets every millisecond. After the fix, no recovery packets are sent.

Testing:

Before

2025-04-17-133225_1177x1057_scrot

After

2025-04-17-133136_1112x564_scrot

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@camshaft camshaft marked this pull request as ready for review April 17, 2025 19:35
@camshaft camshaft requested a review from Copilot April 18, 2025 15:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the stream send state manager that was causing continuous PTO packet transmissions by ensuring that PTO packets are only sent when there are outstanding stream segments.

  • Updated the check in the PTO timer logic to rely exclusively on outstanding stream segments.
  • Added an idle_timeout integration test to validate that no PTO packets are sent when there are no outstanding stream segments.
  • Adjusted the PTO period to enforce a minimum delay, preventing premature PTO triggering.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
dc/s2n-quic-dc/src/stream/tests/idle_timeout.rs Added an integration test to verify proper idle timeout behavior under simulated traffic.
dc/s2n-quic-dc/src/stream/tests.rs Included the new idle_timeout test module in the test suite.
dc/s2n-quic-dc/src/stream/send/state.rs Revised the logic to only consider stream packet buffers for PTO decisions and enforced a min PTO period.

|| !self.sent_recovery_packets.is_empty()
|| !self.retransmissions.is_empty()
|| !self.transmit_queue.is_empty()
!self.stream_packet_buffers.is_empty()
Copy link
Preview

Copilot AI Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm that relying solely on 'stream_packet_buffers' fully accounts for all outstanding stream segments, as previous checks for sent, recovery, or retransmitted packets have been removed. If any of these queues need to be considered, please adjust accordingly.

Suggested change
!self.stream_packet_buffers.is_empty()
// Check if there are any packets in the stream packet buffers
let mut has_packets = !self.stream_packet_buffers.is_empty();
// Include additional checks for sent or reset packets
has_packets |= self.state.is_data_sent() || self.state.is_reset_sent();
has_packets

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was being missed by all the previous checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous checks were too inclusive. Basically if we had a PTO outstanding then this would make the PTO think we need to transmit another packet after it was ACKed. You can see the effect of this in the pcap where we're spamming recovery packets every RTT.

|| !self.sent_recovery_packets.is_empty()
|| !self.retransmissions.is_empty()
|| !self.transmit_queue.is_empty()
!self.stream_packet_buffers.is_empty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was being missed by all the previous checks?


// the `Timestamp::elapsed` function rounds up to the nearest 1ms so we need to set a min value
// otherwise we'll prematurely trigger a PTO
pto_period = pto_period.max(Duration::from_millis(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the source of the bug or did both changes contribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other change was the main source of reduction. Without the rounding change we would transmit an occasional PTO before the peer even had a chance to respond with an ACK.

@camshaft camshaft merged commit 7a59ead into main Apr 18, 2025
126 checks passed
@camshaft camshaft deleted the camshaft/dc-spammy-pto branch April 18, 2025 17:26
dougch pushed a commit that referenced this pull request May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants