-
Notifications
You must be signed in to change notification settings - Fork 1.4k
zebra: uninstall nhg when no best re point to it #18891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hmm - I'd like to hear more about this: that seems ... unexpected, I'd expect that the route "update" code was uninstalling the non-best route |
@mjstapp When an existing route's re (lets say static in the example captured in commit message) becomes non-best, its associated re becomes non best but nhe refcount is greater than 0 so it won't explicitly uninstall. |
ah, ok - it's not the "static route" that isn't uninstalled, it's the nhg(s) used by the static route.
|
That's Correct! Any of the upper layer protocol's re (route_entry) pointing to a particular NHG. If all the re become non-selected then uninstall NHG from kernel but retain in zebra as NHG is being reference by non selected res. |
60373e2
to
50937a9
Compare
04d2e1d
to
9d10f36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this is something we should do. Adding yet-another refcount in this area just seems awkward. Is there some functional issue that the existing behavior leads to that needs to be fixed? Is there just some cosmetic concern?
@@ -2121,6 +2124,8 @@ static void rib_process_result(struct zebra_dplane_ctx *ctx) | |||
if (re) { | |||
UNSET_FLAG(re->status, ROUTE_ENTRY_FAILED); | |||
SET_FLAG(re->status, ROUTE_ENTRY_INSTALLED); | |||
/* Increment the installed re count for the nhe */ | |||
re->nhe->installed_re_count++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no no - we really can't reach-around into the nhe/nhg struct from the rib module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason? can you elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a PR #18722 which also introduces nhe->re pointer and taking action to reprogram routes if certain nhg failed to install upon interface flap.
Please elaborate your thought process.
@@ -3526,7 +3526,11 @@ void zebra_nhg_uninstall_kernel(struct nhg_hash_entry *nhe) | |||
} | |||
} | |||
|
|||
zebra_nhg_handle_uninstall(nhe); | |||
/* do not call delete of the NHG if there are re's referencing it. */ | |||
if (free_nhg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ... the logic here just seems incorrect: this code places the decision-making for the fate of the nhg object entirely outside that object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the free_nhg ensures to not call free but rather simply uninstalled. Given the current code is doing uninstall and free in same function. In my opinion that can be re organized such uninstalled can be independent function and free should be called independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐴
zebra/zebra_nhg.c
Outdated
if (free_nhg) | ||
zebra_nhg_handle_uninstall(nhe); | ||
else | ||
zlog_debug("%s skip nhe %d free as there are re referencing it", __func__, nhe->id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unprotected debug
@@ -3553,6 +3557,15 @@ void zebra_nhg_dplane_result(struct zebra_dplane_ctx *ctx) | |||
"Failed to uninstall Nexthop ID (%u) from the kernel", | |||
id); | |||
|
|||
nhe = zebra_nhg_lookup_id(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the relationship between this new code and the specific problem of adding another refcount to nhgs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have explained before, the objective is to uninstalled the nexthop group which is not selected as best by any of the route.
So when NHG was triggered uninstalled, the dplane result has to unset the installed flag.
Please follow the code from install/uninstall, its a seamless change to mark uninstalled in dplan result for delete (uninstalled) request.
@mjstapp there isn't any functional issue, I have tried to explained the rational using example in the commit message. |
76a603d
to
edc10cf
Compare
Currently when upper layer protocols like ospf and static route sends the same route with different nexthops sequentially, zebra creates two different route_entry (re) and two distinct nhgs. First the static route and corresponding selected and installed, subsequently, the ospf route is installed Once zebra rib selects the ospf as best the static as non best, then the static route is not uninstalled from dplane. When networking restart happens at that point only the best NHG is installed. If user does query of nexthop-group nhe state compare prior to and after networking restart, the behavior is not symmetric. To make consistent, when zebra rib selects the best nhg to installed, adding a bit of logic to uninstall old route_entry's nhg if it is not selected as best by any of the route_entries then uninstall the nhg from kernel. This requires to keep best installed re's count in nhg struct. Ticket: #4417509 system: Before fix: Initial route points to static, route and NHG 34 is installed: DUT# show ip route nexthop-group S>* 20.8.8.0/30 [115/0] (34) via 20.7.7.2, swp31s0, weight 1, 00:26:31 OSFPv2 come up installs the route, route and new NHG 48 is installed: DUT# show ip route nexthop-group B>* 20.6.6.0/30 [20/0] (37) via 20.3.3.2, swp1s2, weight 1, 00:29:24 S 20.6.6.0/30 [75/0] (34) via 20.7.7.2, swp31s0, weight 1, 00:29:30 O>* 20.8.8.0/30 [110/8] (48) via 20.1.1.1, swp1s0, weight 1, 00:01:48 S 20.8.8.0/30 [115/0] (34) via 20.7.7.2, swp31s0, weight 1, 00:29:30 Old NHG 34 is not removed kernel: DUT# show nexthop-group rib 34 ID: 34 (zebra) RefCnt: 2 Uptime: 00:33:27 VRF: default Valid, Installed <<<<< installed Interface Index: 37 via 20.7.7.2, swp31s0 (vrf default), weight After fix: NHG 34 is retained in zebra from unistalled from dplane DUT# show nexthop-group rib 34 ID: 34 (zebra) RefCnt: 2 Uptime: 00:27:06 VRF: default Valid Interface Index: 37 via 20.7.7.2, swp31s0 (vrf default), weight 1 Signed-off-by: Chirag Shah <chirag@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Not needed. |
Currently when upper layer protocols like ospf and static route sends the same route with different nexthops sequentially,
zebra creates two different route_entry (re) and two distinct nhgs.
First the static route and corresponding selected and installed, subsequently, the ospf route is installed
Once zebra rib selects the ospf as best the static as non best, then the static route is not uninstalled from dplane.
When networking restart happens at that point only the best NHG is installed.
If user does query of nexthop-group nhe state compare prior to and after networking restart, the behavior is not symmetric.
To make consistent, when zebra rib selects the best nhg to installed, adding a bit of logic to uninstall old route_entry's nhg if it is not selected as best by any of the route_entries then uninstall the nhg from kernel.
This requires to keep best installed re's count in nhg struct.
Initial route points to static, route and NHG 34 is installed:
DUT# show ip route nexthop-group
Old NHG 34 is not removed kernel:
After fix:
NHG 34 is retained in zebra from unistalled from dplane
DUT# show nexthop-group rib 34
Signed-off-by: Chirag Shah chirag@nvidia.com