Skip to content

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jun 24, 2025

Fixing stale NHG issue in kernel.

Issue1:

  1. zebra creates an nhe and sets 'initial delay' flag for the nexthop received along with kernel/connected route and this routes is a v6 route.
  2. Later zebra receives intf_address event for the interface that belongs to the same nhe created above. but this is v4 event. Then zebra iterates through the nhe set linked to this interface and eventually it will end up installing this nhe in kernel

So, we install the NHG in kernel for connected/kernel routes and that looks to be deviating from the expected behaviour. All this happens when we receive interface event, we attempt a reinstall for all the NHGs associated with that intf. But if the 'initial delay' is already set for an NHG, we can skip that.
Fixing the same.

Issue2:
During FRR restart nexthop-group entries are not getting cleaned up in
below scenario.

  1. Let's say an NHG refcnt is getting decremented and it becomes zero. we
    add a timer for this NHG before deleting it in zebra/kernel.
    so this NHG will be intact in kernel until the timer expires.
  2. Now, the timer is running and frr is getting restarted. All the
    NHGs are getting cleaned up in kernel but the one that has timer
    running is still installed in the kernel.

Check if any NHG has timer running during zebra shutdown and remove from
kernel.


This is an automatic backport of pull request #18899 done by Mergify.

I see this issue during below events sequencing
1. zebra creates an nhe and sets 'initial delay' flag for the nexthop
   received along with kernel/connected route and this routes is a v6
   route.
2. Later zebra receives intf_address event for the interface that
   belongs to the same nhe created above. but this is v4 event. Then
   zebra iterates through the nhe set linked to this interface and
   eventually it will end up installing this nhe in kernel

So, we install the NHG in kernel for connected/kernel routes and that
looks to be deviating from the expected behaviour.
All this happens when we receive interface event, we attempt a reinstall
for all the NHGs associated with that intf. But if the 'initial delay'
is already set for an NHG, we can skip that.
Fixing the same.

Signed-off-by: Krishnasamy <krishnasamyr@nvidia.com>
(cherry picked from commit d7f6d95)

# Conflicts:
#	zebra/zebra_nhg.c
During FRR restart nexthop-group entries are not getting cleaned up in
below scenario.

1. Let's say an NHG refcnt is getting decremented and it becomes zero. we
add a timer for this NHG before deleting it in zebra/kernel.
so this NHG will be intact in kernel until the timer expires.
2. Now, the timer is running and frr is getting restarted. All the
NHGs are getting cleaned up in kernel but the one that has timer
running is still installed in the kernel.

Check if any NHG has timer running during zebra shutdown and remove from
kernel.

Signed-off-by: Krishnasamy <krishnasamyr@nvidia.com>
(cherry picked from commit 0743cca)
@mergify mergify bot added the conflicts label Jun 24, 2025
Copy link
Author

mergify bot commented Jun 24, 2025

Cherry-pick of d7f6d95 has failed:

On branch mergify/bp/stable/10.2/pr-18899
Your branch is up to date with 'origin/stable/10.2'.

You are currently cherry-picking commit d7f6d9580.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   zebra/zebra_nhg.c

no changes added to commit (use "git add" and/or "git commit -a")

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@ton31337 ton31337 closed this Jun 26, 2025
@ton31337 ton31337 deleted the mergify/bp/stable/10.2/pr-18899 branch June 26, 2025 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants