-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Suppress fib funny business #15634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppress fib funny business #15634
Conversation
When BGP has been asked to wait for FIB installation, on route removal a return call is likely to not have the dest since BGP will have cleaned up the node, entirely. Let's just note that the prefix cannot be found if debugs are turned on and move on. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently BGP attempts to send route change information to it's peers *before* the route is installed into zebra. This creates a bug in suppress-fib-pending in the following scenario: a) bgp suppress-fib-pending and bgp has a route with 2 way ecmp. b) bgp receives a route withdraw from peer 1. BGP will send the route to zebra and mark the route as FIB_INSTALL_PENDING. c) bgp receives a route withdraw from peer 2. BGP will see the route has the FIB_INSTALL_PENDING and not send the withdrawal of the route to the peer. bgp will then send the route deletion to zebra and clean up the bgp_path_info's. At this point BGP is stuck where it has not sent a route withdrawal to downstream peers. Let's modify the code in bgp_process_main_one to send the route notification to zebra first before attempting to announce the route. The route withdrawal will remove the FIB_INSTALL_PENDING flag from the dest and this will allow group_announce_route to believe it can send the route withdrawal. For the master branch this is ok because the recent backpressure commits are in place and nothing is going to change from an ordering perspective in that regards. Ostensibly this fix is also for operators of Sonic and will be backported to the 8.5 branch as well. This will change the order of the send to peers to be after the zebra installation but sonic users are using suppress-fib-pending anyways so updates won't go out until rib ack has been received anyways. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@Mergifyio backport dev/10.0 stable/9.1 stable/9.0 stable/8.5 |
✅ Backports have been created
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
Suppress fib funny business (backport #15634)
Suppress fib funny business (backport #15634)
Suppress fib funny business (backport #15634)
I believe this added regression. I was running FRR 8.1 until yesterday when I had to update to 10.x due to an AddPath issue crashing sessions. However now with 10.x I do see similar symptoms as outlined in #15626 - even worse, I see that routes are added to the FIB and yet not announced to peers. (#15786 (comment)) Disabling suppress-fib-pending causes intermittent network loss (anecdotal, we received alerts around 1am tonight and are still going though logs) but fixed the issue of prefixes not being advertised to peers. Worth to mention that 8.1 did not have this issue. //Edit: Further debugging of logs and bgp routing tables show that the network loss might not be related to this, we've set up continuous network probes and bgp table snapshots to get a bigger picture. |
Why I did it Upgrading FRR 8.5.4 to include latest fixes. Work item tracking Microsoft ADO (number only): How I did it New patches that were added: Patch FRR Pull request Issue fixed 0024-lib-use-snmp-s-large-fd-sets-for-agentx.patch FRRouting/frr#13396 FRRouting/frr#14143 0025-bgp-community-memory-leak-fix.patch FRRouting/frr#15466 FRRouting/frr#15459 0026-bgp-fib-suppress-announce-fix.patch FRRouting/frr#15634 FRRouting/frr#15626 0027-lib-Do-not-convert-EVPN-prefixes-into-IPv4-IPv6-if-n.patch FRRouting/frr#15418 FRRouting/frr#14419 Removed patches: Patch Upstream FRR commit that is present in 8.5.4 0019-zebra-Abstract-dplane_ctx_route_init-to-init-route-w.patch FRRouting/frr@3f01977 0020-zebra-Fix-crash-when-dplane_fpm_nl-fails-to-process-.patch FRRouting/frr@fe5f624 0022-bgpd-Don-t-read-the-first-byte-of-ORF-header-if-we-a.patch FRRouting/frr@3515178 0023-bgpd-Make-sure-we-have-enough-data-to-read-two-bytes.patch FRRouting/frr@460ee93 0024-bgpd-Do-not-process-NLRIs-if-the-attribute-length-is.patch FRRouting/frr@f291f1e 0025-bgpd-Use-treat-as-withdraw-for-tunnel-encapsulation-.patch FRRouting/frr@8a4a88c 0026-zebra-Add-encap-type-when-building-packet-for-FPM.patch FRRouting/frr@f0f7b28 0028-bgpd-Check-mandatory-attributes-more-carefully-for-U.patch FRRouting/frr@21418d6 0029-bgpd-Handle-MP_REACH_NLRI-malformed-packets-with-ses.patch FRRouting/frr@30b5c2a 0030-bgpd-Treat-EOR-as-withdrawn-to-avoid-unwanted-handli.patch FRRouting/frr@01f232c 0031-bgpd-Ignore-handling-NLRIs-if-we-received-MP_UNREACH.patch FRRouting/frr@a0c4ec2 0032-zebra-Fix-fpm-multipath-encap-addition.patch FRRouting/frr@10a9a5f Realigned patches: Old Patch New patch 0005-Add-support-of-bgp-l3vni-evpn.patch 0005-Add-support-of-bgp-l3vni-evpn.patch 0021-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch 0019-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch 0027-zebra-Fix-non-notification-of-better-admin-won.patch 0020-zebra-Fix-non-notification-of-better-admin-won.patch Disable-ipv6-src-address-test-in-pceplib.patch 0021-Disable-ipv6-src-address-test-in-pceplib.patch cross-compile-changes.patch 0022-cross-compile-changes.patch 0033-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch 0023-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch How to verify it Running sonic-mgmt test suite.
See 2nd commit for the real meat.
Effectively w/ suppress-fib-pending a withdrawal will not be sent to a peer if BGP is already in the process of waiting for a previous route install for the same route.
ie suppose you have 2 way ecmp w/ bgp-suppress-fib-pending. peer 1 withdraws, bgp marks the route for FIB_INSTALLING and does not send an update. Then if peer 2 withdraws it's route before zebra gives notice, when attempting to send the route via group_announce_route it sees the FIB_INSTALLING flag and does not send the withdrawal.
Closes: #15626