Skip to content

Conversation

krishna-samy
Copy link
Contributor

@krishna-samy krishna-samy commented May 28, 2025

Fixing stale NHG issue in kernel.

Issue1:

  1. zebra creates an nhe and sets 'initial delay' flag for the nexthop received along with kernel/connected route and this routes is a v6 route.
  2. Later zebra receives intf_address event for the interface that belongs to the same nhe created above. but this is v4 event. Then zebra iterates through the nhe set linked to this interface and eventually it will end up installing this nhe in kernel

So, we install the NHG in kernel for connected/kernel routes and that looks to be deviating from the expected behaviour. All this happens when we receive interface event, we attempt a reinstall for all the NHGs associated with that intf. But if the 'initial delay' is already set for an NHG, we can skip that.
Fixing the same.

Issue2:
During FRR restart nexthop-group entries are not getting cleaned up in
below scenario.

  1. Let's say an NHG refcnt is getting decremented and it becomes zero. we
    add a timer for this NHG before deleting it in zebra/kernel.
    so this NHG will be intact in kernel until the timer expires.
  2. Now, the timer is running and frr is getting restarted. All the
    NHGs are getting cleaned up in kernel but the one that has timer
    running is still installed in the kernel.

Check if any NHG has timer running during zebra shutdown and remove from
kernel.

@krishna-samy krishna-samy force-pushed the krishna-samy/stale-nhg branch from 43aa593 to e3024da Compare May 29, 2025 13:49
@frrbot frrbot bot added the bugfix label May 30, 2025
@github-actions github-actions bot added size/M and removed size/XS labels May 30, 2025
@krishna-samy krishna-samy changed the title zebra: do not install the nhg for kernel/connected routes zebra: fix stale NHG in kernel May 30, 2025
@krishna-samy
Copy link
Contributor Author

Adding another commit to address stale NHG during zebra shutdown.
Both the commits dealing with stale NHG entries in kernel in different scenarios

@riw777
Copy link
Member

riw777 commented Jun 3, 2025

Is this related to #18891 ???

@krishna-samy
Copy link
Contributor Author

Is this related to #18891 ???

Both are different.
#18891 - This is about stale NHG while 2 different protocols install NHGs (same route with different nexthops)
#18899 - This is about stale NHG where we install them in kernel when 'initial delay' is set.

@krishna-samy krishna-samy force-pushed the krishna-samy/stale-nhg branch from a3f1aae to 7fc84dc Compare June 4, 2025 14:39
@github-actions github-actions bot added the rebase PR needs rebase label Jun 4, 2025
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@krishna-samy
Copy link
Contributor Author

ci:rerun

@krishna-samy
Copy link
Contributor Author

There is one failure in the CI test. That looks to be failing in other PRs also. It is not related to this code change.
Example: https://github.com/FRRouting/frr/pull/18905/checks?check_run_id=43505419217

@ton31337
Copy link
Member

@Mergifyio backport dev/10.4 stable/10.3 stable/10.2

Copy link

mergify bot commented Jun 11, 2025

backport dev/10.4 stable/10.3 stable/10.2

✅ Backports have been created

@ton31337
Copy link
Member

@krishna-samy
Copy link
Contributor Author

Looks like https://ci1.netdef.org/browse/FRR-PULLREQ3-9554/artifact/ASAN9D12AMD64/AddressSanitizerError/AddressSanitzer.txt are valid?

yes. this call stack looks to be relevant. let me fix it.

@krishna-samy krishna-samy force-pushed the krishna-samy/stale-nhg branch from 7fc84dc to 4ceb1bf Compare June 12, 2025 14:49
@krishna-samy
Copy link
Contributor Author

Looks like https://ci1.netdef.org/browse/FRR-PULLREQ3-9554/artifact/ASAN9D12AMD64/AddressSanitizerError/AddressSanitzer.txt are valid?

There is an issue with using hash_iterate improperly. The function hash_iterate stores hbnext = hb->next; before calling the callback function . Also, in the zebra_nhg_sweep_stale_entry function, the callback is not just deleting the current bucket - it's causing a cascade of deletions that can free other buckets in the chain as well including ones that the iterator hasn't reached yet. This leads to use-after-free when the iterator tries to access the freed memory.
So, modifying the code to use hash_walk similar to other/existing NHG clean-up.

@krishna-samy
Copy link
Contributor Author

There is one test failure and that does not look to be relevant to this change. Same failure is seen in other PRs as well.

@krishna-samy
Copy link
Contributor Author

ci:rerun

@krishna-samy
Copy link
Contributor Author

@Mergifyio rebase

Copy link

mergify bot commented Jun 16, 2025

rebase

❌ Unable to rebase: user krishna-samy is unknown.

Please make sure krishna-samy has logged in Mergify dashboard.

@krishna-samy
Copy link
Contributor Author

ci:rerun

@krishna-samy krishna-samy force-pushed the krishna-samy/stale-nhg branch 2 times, most recently from f68fa49 to 3e5c0da Compare June 18, 2025 05:13
@krishna-samy
Copy link
Contributor Author

ci:rerun

@krishna-samy
Copy link
Contributor Author

@ashred-lnx
I have made the changes as suggested. please check.

@ashred-lnx
Copy link
Contributor

@ashred-lnx I have made the changes as suggested. please check.

LTGM

@krishna-samy
Copy link
Contributor Author

ci:rerun

@krishna-samy
Copy link
Contributor Author

The test failures are unrelated to this changes.

@krishna-samy
Copy link
Contributor Author

ci:rerun

@krishna-samy
Copy link
Contributor Author

https://github.com/Mergifyio rebase

I see this issue during below events sequencing
1. zebra creates an nhe and sets 'initial delay' flag for the nexthop
   received along with kernel/connected route and this routes is a v6
   route.
2. Later zebra receives intf_address event for the interface that
   belongs to the same nhe created above. but this is v4 event. Then
   zebra iterates through the nhe set linked to this interface and
   eventually it will end up installing this nhe in kernel

So, we install the NHG in kernel for connected/kernel routes and that
looks to be deviating from the expected behaviour.
All this happens when we receive interface event, we attempt a reinstall
for all the NHGs associated with that intf. But if the 'initial delay'
is already set for an NHG, we can skip that.
Fixing the same.

Signed-off-by: Krishnasamy <krishnasamyr@nvidia.com>
During FRR restart nexthop-group entries are not getting cleaned up in
below scenario.

1. Let's say an NHG refcnt is getting decremented and it becomes zero. we
add a timer for this NHG before deleting it in zebra/kernel.
so this NHG will be intact in kernel until the timer expires.
2. Now, the timer is running and frr is getting restarted. All the
NHGs are getting cleaned up in kernel but the one that has timer
running is still installed in the kernel.

Check if any NHG has timer running during zebra shutdown and remove from
kernel.

Signed-off-by: Krishnasamy <krishnasamyr@nvidia.com>
Copy link

mergify bot commented Jun 24, 2025

rebase

✅ Branch has been successfully rebased

@krishna-samy krishna-samy force-pushed the krishna-samy/stale-nhg branch from 3e5c0da to 0743cca Compare June 24, 2025 04:56
@krishna-samy
Copy link
Contributor Author

All the comments are addressed and the tests are passing

@Jafaral Jafaral merged commit 034e716 into FRRouting:master Jun 24, 2025
13 checks passed
Jafaral added a commit that referenced this pull request Jun 24, 2025
Jafaral added a commit that referenced this pull request Jun 24, 2025
@krishna-samy krishna-samy deleted the krishna-samy/stale-nhg branch June 25, 2025 06:24
ton31337 added a commit to opensourcerouting/frr that referenced this pull request Aug 2, 2025
* bgpd: correct no form commands (backport FRRouting#18911)
* bgpd: fix to show exist/non-exist-map in 'show run' properly FRRouting#18853
* redhat: make FRR RPM build to work on RedHat 10 (backport FRRouting#18920)
* build: check for libunwind.h, not unwind.h (backport FRRouting#18912)
* bgpd: use AS4B format for BGP loc-rib messages. (backport FRRouting#18936)
* bgpd: fix for the validity and the presence of prefixes in the BGP VPN table. (backport FRRouting#17370)
* bgpd: Force adj-rib-out updates if MRAI is kicked in (backport FRRouting#18959)
* zebra: Provide SID value when sending SRv6 SID release notify message (backport FRRouting#18971)
* bgpd: Fix crash when fetching statistics for bgp instance (backport FRRouting#19003)
* nhrpd: fix crash when accessing invalid memory zone (backport FRRouting#18994)
* zebra: Initialize RB tree for router tables (backport FRRouting#19049)
* zebra: fix null pointer dereference in zebra_evpn_sync_neigh_del (backport FRRouting#19054)
* zebra: fix stale NHG in kernel (backport FRRouting#18899)
* bgpd: Fix incorrect stripping of transitive extended communities (backport FRRouting#19065)
* lib: Fix no on-match goto NUM command (backport FRRouting#19108)
* bgpd: Fix extended community check for IP non-transitive type (backport FRRouting#19097)
* bgpd: Fix DEREF_OF_NULL.EX.COND in bgp_updgrp_packet (backport FRRouting#19126)
* lib: revert addition of vtysh_flush() call in vty_out() (backport FRRouting#19109)
* bgpd: Extract link bandwidth value from extcommunity before using for WCMP (backport FRRouting#19165)
* Use ipv4 class E addresses (240.0.0.0/4) as connected routes by default (backport FRRouting#18095)
* bfdd: Set bfd.LocalDiag when transitioning to AdminDown (backport FRRouting#18592)
* zebra: clean up a json object leak (backport FRRouting#19192)
* bgpd: Do not try to reuse freed route-maps (backport FRRouting#19191)
* lib: fix routemap crash (backport FRRouting#19127)
* bgpd: initialize local variable (backport FRRouting#19233)
* ospfd: Use after free cleanup of lsa (backport FRRouting#19224)
* vtysh: copy config from file should actually apply (backport FRRouting#19242)
* bgpd : Fix compilation error in bgpd module: Update TP_ARGS for bgp (backport FRRouting#19266)
* bgpd: Ensure addpath does not withdraw selected route in some situations (backport FRRouting#19210)
* lib, zebra: mark singleton nexthops inactive/active on link state changes for wecmp (backport FRRouting#18947)
* eigrp: validate hello packets and tlvs better (backport FRRouting#19251)
* bgpd: [GR] fixed selectionDeferralTimer to display select_defer_time val FRRouting#19283

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants