-
Notifications
You must be signed in to change notification settings - Fork 1.4k
zebra: Re-install all route entries upon kernel NHG action/Quick intf flaps + NHG fixes and improvements #18722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
zebra: Re-install all route entries upon kernel NHG action/Quick intf flaps + NHG fixes and improvements #18722
Conversation
While writing topotests for this in the Staticd has a route 1.1.1.1/32 and a nexthop 1.1.1.1/32(for route 6.6.6.0)
Worst case I will write a new topo test rather than modifying existing. |
a88b62d
to
4e0ba30
Compare
dd0de7c
to
6765423
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
6765423
to
6a19156
Compare
336c6cf
to
c23e450
Compare
c23e450
to
b4ec853
Compare
Sample Logs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please go over your code and fix all the similar minor nits I marked (there are more, but didn't mark all of them).
tests/topotests/all_protocol_startup/test_all_protocol_startup.py
Outdated
Show resolved
Hide resolved
tests/topotests/all_protocol_startup/test_all_protocol_startup.py
Outdated
Show resolved
Hide resolved
tests/topotests/all_protocol_startup/test_all_protocol_startup.py
Outdated
Show resolved
Hide resolved
tests/topotests/all_protocol_startup/test_all_protocol_startup.py
Outdated
Show resolved
Hide resolved
@@ -175,8 +178,35 @@ struct route_entry { | |||
*/ | |||
struct nexthop_group fib_ng; | |||
struct nexthop_group fib_backup_ng; | |||
|
|||
/* Selected re using an nhe are in its re RB tree */ | |||
struct nhe_re_tree_item re_item; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for state keeping of route<->nexthop is keeping route->nhe better that nhe->re ? wouldnt nhe->re have lesser iterations?
/* This is the first time it's being attached */ | ||
route_entry_attach_ref(re, new_nhghe); | ||
nhe_add_or_del_re_tree(new_nhghe, re, __func__, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this tracked only on failure or for every re install?
d016979
to
f52ceec
Compare
f52ceec
to
c58e911
Compare
c58e911
to
412ed81
Compare
Ability to have a back ptr from re to rn. To be used by future commits Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
… flaps When operations such as the below happens, - ip nexthop flush - ip nexthop del id <> - quick flap of interface there are cases where the routes are re-installed when the nexthop-group isnt yet installed. This will lead to route install failures which isnt re-tried there by out of sync issues between kernel and zebra. Take ip nexthop flush for example: Say we have 10 routes using NHG 100 [50,60]. Upon flush, kernel cleans up all the route + nexthop entries but notifies zebra only about the nexthop delete event. Sequence of events could be - Kernel sends 100 NHG delete - Kernel sends 50 NHG delete - Kernel sends 60 NHG delete When we receive NHG 100 delete, zebra immediately sees "Oh! routes still point to it, let me reinstall NHG 100" but then kernel doesnt know about the NHG 50 and 60 yet (so NHG 100 installation fails as expected).. Only after the NHG's 50 and 60 are installed, the NHG 100 is installed successfully. But in this timeframe, route installation(new/updated) fails since NHG 100 was never installed and then it will never be attempted to re-install. Fix here is to maintain a rb tree of all the route-entries which uses a nhe with the head of the tree in the struct nhg_hash_entry. When the nhe is installed, we walk this tree (and its dependent NHE tree if needed) and reinstall all the re's which were SELECTED for install at that point in time. Post the NHG install, the route installation will be successful as anyways. It also takes care of unsetting the INSTALLED flag in re in case of events such as ip nexthop delete/flush or quick link flaps. For future purposes, we can use this nhe->re_head tree as per needed. Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Adding a new entry to the existing show command "show nexthop-group rib <ID> routes" which will walk all the re's which uses that particular NHG and displays the route node, protocol and the installed status. Examples: r1# sh nexthop-group rib 40 routes Routes using nexthop group 40: Route Entry Status Protocol AFI SAFI ------------------------------ --------------- ---------- ------- ------- 4.5.6.10/32 installed static IPv4 unicast 4.5.6.11/32 installed static IPv4 unicast 4.5.6.15/32 not installed static IPv4 unicast 4.5.6.16/32 installed static IPv4 unicast 4.5.6.17/32 installed static IPv4 unicast Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
1) Adding NHG flags to debugs at few places 2) Making some debugs as IS_ZEBRA_DEBUG_NHG_DETAIL instead of IS_ZEBRA_DEBUG_NHG Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Add re's flags and status in the show ip/ipv6 route <> (json) command r1# sh ip route 3.3.3.1/32 Routing entry for 3.3.3.1/32 Known via "sharp", distance 150, metric 0, best Last update 00:00:49 ago Flags: Recursion Selected Status: Installed 2.2.2.1 (recursive), weight 1 * 1.1.1.1, via r1-eth1 onlink, weight 1 * 1.1.1.2, via r1-eth2 onlink, weight 1 Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Adding few other flags to be displayed in the show nexthop-group rib command. Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Add macros to tell if an NHE is Singleton (recursive + direct), direct singleton and recursive singleton Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
To make life a bit easy, using the NHG SINGLETON macros where applicable Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
…kernel in sync Consider the following scenario Initial state: - NHG 112[92/94], 92(eth1) and 94(eth2) - 3.3.3.1/32 uses NHG 112 , so zebra and kernel has 112 with 2 nexthops (92 and 94) Trigger: Flap eth1-4 quickly.. Final state: (Without fix): - 3.3.3.1/32 uses NHG 112 and kernel has 112 with 1 nexthop (92) while zebra has NHG 112 with 2 nexthops (92 and 94) Below is the sequence of events 1. Upon eth1 and eth2 down, we mark the 92 and 94 Invalid. 2. eth1 comes up 3. 92 and 112 have REINSTALL flag set 4. 92 installed into kernel 5. dplane nhg result of 92 will queue install of 112[92 Valid, 94 Invalid] 6. 112 is Installed into kernel (has 112 with 1 NH (92)) 7. Before 112[92 Valid] install result is handled, eth2 comes up 7a) So we mark 94 as valid. Also, at this point, 94 and 112 have REINSTALL and VALID flag set --> 112[92 Valid, 94 Valid] 8. NHG install result of 112[92V, 94I] (step 6) is handled and we unset REINSTALL and QUEUED flags and set INSTALLED flag. 9. Now 94 is installed and as part of NHG install result handling, we try to install 112. 10. However since 112 does not have REINSTALL and has INSTALLED flag set, it wont be queued to install. This means zebra thinks it installed 112[92V, 94V] while kernel only has 112[92] Logs: 2025/05/30 07:19:14.153717 ZEBRA: [NVFT0-HS1EX] INTF_INSTALL for r1-eth1(3) 2025/05/30 07:19:14.154678 ZEBRA: [ZHFB4-0MJKD] zebra_interface_nhg_reinstall install nhe 92[1.1.1.1 if 3 vrfid 0] nh type 3 flags 0x5 2025/05/30 07:19:14.154678 ZEBRA: [PJXNX-HZST8] zebra_nhg_install_kernel: valid flag set for nh 92[1.1.1.1 if 3 vrfid 0] 2025/05/30 07:19:14.154681 ZEBRA: [ZJTBR-T51KC] zebra_interface_nhg_reinstall dependent nhe 112[92/94] flags (0x100) Setting Reinstall flag 2025/05/30 07:19:14.154810 ZEBRA: [J7K9Z-9M7DT] Nexthop dplane ctx 0xaaaab8422640, op NH_INSTALL, nexthop ID (92), result SUCCESS 2025/05/30 07:19:14.154840 ZEBRA: [MVMV2-45SG2] zebra_nhg_handle_install nh id 92 (flags 0x3) associated dependent NHG 112[92/94] install 2025/05/30 07:19:14.157060 ZEBRA: [NVFT0-HS1EX] INTF_INSTALL for r1-eth2(4) //Installed 112 with 92 Valid 2025/05/30 07:19:14.157521 ZEBRA: [JAS4D-NCWGP] nlmsghdr [len=44 type=(104) NEWNEXTHOP flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=1345 pid=3546608047] 2025/05/30 07:19:14.157521 ZEBRA: [WCX94-SW894] nhm [family=(0) AF_UNSPEC scope=(0) UNIVERSE protocol=(11) ZEBRA flags=0x00000000 {}] 2025/05/30 07:19:14.157521 ZEBRA: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(1) ID] 2025/05/30 07:19:14.157522 ZEBRA: [Z4E9C-GD9EP] 112 2025/05/30 07:19:14.157522 ZEBRA: [KFBSR-XYJV1] rta [len=12 (payload=8) type=(2) GROUP] 2025/05/30 07:19:14.157522 ZEBRA: [HD3EC-5WE2D] id 92 weight 0 2025/05/30 07:19:14.157841 ZEBRA: [PJXNX-HZST8] zebra_nhg_install_kernel: valid flag set for nh 94[1.1.1.2 if 4 vrfid 0] 2025/05/30 07:19:14.157844 ZEBRA: [ZJTBR-T51KC] zebra_interface_nhg_reinstall dependent nhe 112[92/94] flags (0x105) Setting Reinstall flag 2025/05/30 07:19:14.157918 ZEBRA: [J7K9Z-9M7DT] Nexthop dplane ctx 0xaaaab8411e80, op NH_INSTALL, nexthop ID (112), result SUCCESS //Now 94 installation kicks 112 installation which is ignore due to flags state 2025/05/30 07:19:14.158048 ZEBRA: [J7K9Z-9M7DT] Nexthop dplane ctx 0xaaaab84231c0, op NH_INSTALL, nexthop ID (94), result SUCCESS 2025/05/30 07:19:14.158112 ZEBRA: [MVMV2-45SG2] zebra_nhg_handle_install nh id 94 (flags 0x3) associated dependent NHG 112[92/94] install Fix: Force install dependent NHGs in the dplane NHG result of singleton install in all cases. This can lead to reinstalling an already installed NHG with no change, but that is ok to live with as long as zebra and kernel are in sync. Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Current code has issues where (for example) when an interface goes down, we are supposed to UNSET all the impacted nexthops in the NHGs dependent chain as ACTIVE and UNSET the NHG as VALID (if no more active nhs). However there are bugs in the existing code. One such example - NHG 91(r1-eth1) and NHG 92(r1-eth2) - NHG 114 depends are 91 and 93 - NHG 113 depends 114 i.e. [91,93]->114->113 If we put down the Intf r1-eth1 and r1-eth2, we are supposed to UNSET 113 and 114 NHs as ACTIVE and NHGs as INVALID. However, we are comparing things wrong. We are supposed to compare nexthop of 91 across the entire chain i.e. compare nh of 91 (1.1.1.1) with 113, compare nh of 91 (1.1.1.1) with 114. But the code doesnt do that. Initial State: root@r1:/tmp/topotests/all_protocol_startup.test_all_protocol_startup/r1# ip route show | grep 3.3.3.1 3.3.3.1 nhid 114 proto 194 metric 20 root@r1:/tmp/topotests/all_protocol_startup.test_all_protocol_startup/r1# vtysh -c "sh nexthop-group rib 114" ID: 114 (zebra) Nexthop Count: 2 Valid, Installed Depends: (91) (93) via 1.1.1.1, r1-eth1 (vrf default) onlink, weight 1 via 1.1.1.2, r1-eth2 (vrf default) onlink, weight 1 Dependents: (113) root@r1:/tmp/topotests/all_protocol_startup.test_all_protocol_startup/r1# vtysh -c "sh nexthop-group rib 113" ID: 113 (zebra) Valid Depends: (114) via 2.2.2.1 (vrf default) (recursive), weight 1 via 1.1.1.1, r1-eth1 (vrf default) onlink, weight 1 via 1.1.1.2, r1-eth2 (vrf default) onlink, weight 1 Dependents: (121) Trigger: root@r1:/tmp/topotests/all_protocol_startup.test_all_protocol_startup/r1# ip link set down r1-eth1 root@r1:/tmp/topotests/all_protocol_startup.test_all_protocol_startup/r1# ip link set down r1-eth2 Final State: root@r1:/tmp/topotests/all_protocol_startup.test_all_protocol_startup/r1# vtysh -c "sh nexthop-group rib 114" ID: 114 (zebra) Depends: (91) (93) via 1.1.1.1, r1-eth1 (vrf default) inactive onlink, weight 1 via 1.1.1.2, r1-eth2 (vrf default) inactive onlink, weight 1 Dependents: (113) root@r1:/tmp/topotests/all_protocol_startup.test_all_protocol_startup/r1# vtysh -c "sh nexthop-group rib 113" ID: 113 (zebra) Valid ---> WRONG Depends: (114) via 2.2.2.1 (vrf default) (recursive), weight 1 via 1.1.1.1, r1-eth1 (vrf default) onlink, weight 1 --> WRONG via 1.1.1.2, r1-eth2 (vrf default) onlink, weight 1 --> WRONG Dependents: (121) (259) There are many such cases where the propogation of ACTIVE(NH) and VALID(NHG) is broken.(Easily and especially when we have NHG using recursive NHs) Fixing this by rewriting the zebra_nhg_check/set_valid function to propogate the dependency chain for singleton(direct/recursive) and groups for triggers such as NHG add/del or interface flaps/del. Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
ee95583
to
5efad47
Compare
…propogation 1) Add test cases for ip nexthop flush/del 2) Add test cases for Quick Interface flaps 3) Add test cases for Checking the NHG dependency chain propogation for ACTIVE and VALID flags. 4) Remove sleep and add retry mechanisms 5) Add retry logic to other tests Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
5efad47
to
11f7bbf
Compare
In the existing topotest isis_tilfa_topo1-step10, the existing script immediately expects an output which matches and hence it was working all along. However, if we add some sleep post the interface down, the script fails as expected i.e. in other words, we dont wait for enough time for zebra to handle the interface down event to change the NHGs to backup. With the NHG route reinstall logic in place, things happen quickly and that is i figured the existing script itself has issues. Fixing the expectation of show ip/ipv6 route isis json. Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
00ea389
to
e0ea389
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
NHG improvements + fixes along with
zebra: Re-install all route entries upon kernel NHG action/Quick intf flaps
When operations such as the below happens,
there are cases where the routes are re-installed when the nexthop-group
isnt yet installed. This will lead to route install failures which isnt
re-tried there by out of sync issues between kernel and zebra.
Take ip nexthop flush for example:
Say we have 10 routes using NHG 100 [50,60]. Upon flush, kernel cleans
up all the route + nexthop entries but notifies zebra only about the
nexthop delete event. Sequence of events could be
When we receive NHG 100 delete, zebra immediately sees "Oh! routes still
point to it, let me reinstall NHG 100" but then kernel doesnt know about
the NHG 50 and 60 yet (so NHG 100 installation fails as expected)..
Only after the NHG's 50 and 60 are installed, the NHG 100 is installed
successfully. But in this timeframe, route installation(new/updated)
fails since NHG 100 was never installed and then it will never be
attempted to re-install.
Fix here is to maintain a rb tree of all the route-entries which uses
a nhe with the head of the tree in the struct nhg_hash_entry.
When the nhe is installed, we walk this tree and reinstall all the re's
which were SELECTED for install at that point in time. Post the NHG install,
the route installation will be successful as anyways.
It also takes care of unsetting the INSTALLED flag in re in case of
events where a NHG is to be reinstalled/going away.
For future purposes, we can use this nhe->re_head tree as per needed.
Also added a new show command