Skip to content

Conversation

routingrocks
Copy link
Contributor

Topology:
A single router is acting as the First Hop Router (FHR), Last Hop Router (LHR), and RP.

RC and Issue:
When an upstream S,G is in join state, it sends a register message to the RP. If the RP has the receiver, it sends a register stop message and switches to the shortest path. When the register stop message is processed, it removes pimreg, moves to prune, and starts the reg stop timer.

When the reg stop timer expires, PIM changes S,G state to Join Pending and sends out a NULL register message to RP. RP receives it and fails to send Reg stop because SPT is not set at that point.

The problem is when the register stop timer pops and state is in Join Pending. According to https://www.rfc-editor.org/rfc/rfc4601#section-4.4.1, we need to put back the pimreg reg tunnel into the S,G mroute. This causes data to be sent to the control plane and subsequently interrupts the line rate.

Fix:
If the router is FHR and RP to the group,
ignore SPT status and send out a register stop message back to the DR (in this context, the same router).

Ticket: #
Signed-off-by: Donald Sharp sharpd@nvidia.com
Signed-off-by: Rajesh Varatharaj rvaratharaj@nvidia.com

@mwinter-osr
Copy link
Member

CI:rerun rerunning with CI webhook after disabling the CI Checks App

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: FAILED

Test incomplete. See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-13620/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Incomplete

Addresssanitizer topotests part 4: Incomplete (check logs for details)
Successful on other platforms/tests
  • Addresssanitizer topotests part 3
  • Topotests Ubuntu 18.04 amd64 part 7
  • Topotests debian 10 amd64 part 1
  • Topotests Ubuntu 18.04 i386 part 5
  • Topotests Ubuntu 18.04 i386 part 0
  • Topotests Ubuntu 18.04 amd64 part 4
  • CentOS 7 rpm pkg check
  • Topotests Ubuntu 18.04 i386 part 9
  • Addresssanitizer topotests part 2
  • Topotests Ubuntu 18.04 amd64 part 0
  • Addresssanitizer topotests part 1
  • Topotests Ubuntu 18.04 i386 part 4
  • Topotests debian 10 amd64 part 5
  • Topotests Ubuntu 18.04 amd64 part 1
  • Topotests debian 10 amd64 part 7
  • Debian 9 deb pkg check
  • Addresssanitizer topotests part 8
  • Topotests Ubuntu 18.04 amd64 part 6
  • Topotests Ubuntu 18.04 i386 part 2
  • Topotests Ubuntu 18.04 arm8 part 8
  • Ubuntu 18.04 deb pkg check
  • Topotests debian 10 amd64 part 6
  • Topotests Ubuntu 18.04 arm8 part 6
  • Topotests Ubuntu 18.04 arm8 part 1
  • Topotests Ubuntu 18.04 i386 part 7
  • Addresssanitizer topotests part 6
  • Addresssanitizer topotests part 5
  • Topotests Ubuntu 18.04 arm8 part 3
  • Topotests Ubuntu 18.04 i386 part 3
  • Topotests Ubuntu 18.04 amd64 part 2
  • Topotests Ubuntu 18.04 i386 part 8
  • Topotests debian 10 amd64 part 3
  • Topotests debian 10 amd64 part 8
  • Addresssanitizer topotests part 0
  • Topotests Ubuntu 18.04 arm8 part 4
  • Topotests debian 10 amd64 part 4
  • Topotests debian 10 amd64 part 9
  • Topotests Ubuntu 18.04 arm8 part 9
  • Topotests Ubuntu 18.04 amd64 part 3
  • Topotests Ubuntu 18.04 arm8 part 2
  • Static analyzer (clang)
  • Topotests debian 10 amd64 part 0
  • Addresssanitizer topotests part 9
  • Topotests debian 10 amd64 part 2
  • Topotests Ubuntu 18.04 i386 part 6
  • Topotests Ubuntu 18.04 arm8 part 0
  • Topotests Ubuntu 18.04 arm8 part 7
  • Topotests Ubuntu 18.04 amd64 part 8
  • Topotests Ubuntu 18.04 arm8 part 5
  • Topotests Ubuntu 18.04 i386 part 1
  • Topotests Ubuntu 18.04 amd64 part 9
  • Addresssanitizer topotests part 7
  • Ubuntu 20.04 deb pkg check
  • Debian 10 deb pkg check
  • Topotests Ubuntu 18.04 amd64 part 5

Copy link
Contributor

@mobash-rasool mobash-rasool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is how the flow should be:
(S,G) - OIL list should be having pimreg + Interface-connected-to-LHR (due to inheritance) initially
The kernel stats must be getting incremented now since the traffic is coming over (S,G) and hence upstream->sptbit should have been set and pimreg removed and that way the register-stop would have been sent too due to spt bit. And this flap of adding and removing of pimreg would not be occurring.
So we need to check why upstream->sptbit could not be set, is there some problem with kernel stats or we are unable to reach the point where upstream->sptbit is set in pimd.

@RodrigoMNardi
Copy link
Contributor

ci:rerun

Copy link

This PR is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this PR closed.

@routingrocks
Copy link
Contributor Author

ci:rerun

@Jafaral
Copy link
Member

Jafaral commented Aug 23, 2024

Can you rebase instead ?

@routingrocks
Copy link
Contributor Author

Failure is not related to the change, re-running CI
E AssertionError: Testcase test_verify_default_originate_with_2way_ecmp_p2 : After shuting down the interface Convergence is expected to be Failed assert True is not Tru

@routingrocks
Copy link
Contributor Author

ci:rerun

1 similar comment
@routingrocks
Copy link
Contributor Author

ci:rerun

Topology:
A single router is acting as the First Hop Router (FHR), Last Hop Router (LHR), and RP.

RC and Issue:
When an upstream S,G is in join state, it sends a register message to the RP.
If the RP has the receiver, it sends a register stop message and switches to the shortest path.
When the register stop message is processed, it removes pimreg, moves to prune,
and starts the reg stop timer.

When the reg stop timer expires, PIM changes S,G state to Join Pending and sends out a NULL
register message to RP. RP receives it and fails to send Reg stop because SPT is not set at that point.

The problem is when the register stop timer pops and state is in Join Pending.
According to https://www.rfc-editor.org/rfc/rfc4601#section-4.4.1,
we need to put back the pimreg reg tunnel into the S,G mroute.
This causes data to be sent to the control plane and subsequently interrupts the line rate.

Fix:
If the router is FHR and RP to the group,
ignore SPT status and send out a register stop message back to the DR (in this context, the same router).

Ticket: #3506780

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajesh Varatharaj <rvaratharaj@nvidia.com>
@mobash-rasool mobash-rasool self-requested a review February 14, 2025 07:01
@mobash-rasool mobash-rasool dismissed their stale review February 14, 2025 07:02

Will let others review it.

@Jafaral
Copy link
Member

Jafaral commented Feb 20, 2025

@Mergifyio backport dev/10.3 stable/10.2 stable/10.1

Copy link

mergify bot commented Feb 20, 2025

backport dev/10.3 stable/10.2 stable/10.1

✅ Backports have been created

@Jafaral Jafaral merged commit fc6b041 into FRRouting:master Feb 20, 2025
13 checks passed
donaldsharp added a commit that referenced this pull request Feb 20, 2025
pimd: Fix for data packet loss when FHR is LHR and RP (backport #14227)
donaldsharp added a commit that referenced this pull request Feb 20, 2025
pimd: Fix for data packet loss when FHR is LHR and RP (backport #14227)
donaldsharp added a commit that referenced this pull request Feb 20, 2025
pimd: Fix for data packet loss when FHR is LHR and RP (backport #14227)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants