Skip to content

Conversation

piotrsuchy
Copy link
Contributor

@piotrsuchy piotrsuchy commented Oct 2, 2024

Currently there are cases, where a recursive route has an update of a resolved nexthop, but it is not properly picked up by zebra, because zebra thinks their hashes of nhg are equal.
This PR has a fix for this issue.

Topology:

We have 3 hosts running FRR and a 4th host which runs gobgp and injects static routes into the FRR hosts. The host running gobgp acts like a route server and the hosts are peered with the route server (actually 3 more instances of the route server but that's redundant for this bug) using bgp evpn and the injected static routes are evpn routes:

host1# show bgp l2vpn evpn
BGP table version is 22, local router ID is 10.40.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 10.40.0.1:1
 *>  [5]:[0]:[32]:[12.34.56.78]
                    10.40.0.1(host1)
                                                  150      0 4250134000 4205134000 4259934000 4250034002 ?
                    ET:8 RT:24865:3 Rmac:52:f5:e3:49:2f:5b
 *>  [5]:[0]:[32]:[22.22.22.22]
                    10.40.0.1(host1)
                                                  150      0 4250134000 4205134000 4259934000 4250034002 ?
                    ET:8 RT:24865:3 Rmac:52:f5:e3:49:2f:5b
...

The reproduction of this bug goes like this:

  • gobgp injecting static route '22.22.22.22 via 2300:5800::1' for all of the hosts
  • one of the hosts has 2300:5800::1 via fe80::2 dev tap<host_id> kernel route
  • at the same time we remove the kernel route from one of the hosts (in this example host1) and add it to on the other hosts (host3)
  • we check the state of the host that didn't do anything (host2 in the example) and we can see that the route to '2300:5800::1' got properly update (for hosts1 and host2 it properly shows that the nexthop is host3 now), but the recursive route to 22.22.22.22 still shows:

22.22.22.22.22 via 2300:5800::1 via

Which means it wasn't properly updated and FRR has a conflicting state, where in one place the nexthop of the 2300:5800::1 route is host3's address (because the kernel route to 2300:5800::1 is there) and in the recursive route it's host1's address (because it used to be there and it wasn't updated properly).

I also have two videos with more details, showing the reproduction of the bug using:

If you have any more question about the topology or reproduction let me know.

@ton31337
Copy link
Member

ton31337 commented Oct 2, 2024

Could you fix first https://ci1.netdef.org/browse/FRR-PULLREQ3-CHECKOUT-5250 checkpatch.pl issues?

@piotrsuchy piotrsuchy force-pushed the incorrect_recursive_nexthop branch 2 times, most recently from 954a39b to 3be7e2c Compare October 2, 2024 10:02
@github-actions github-actions bot added size/XS and removed size/S labels Oct 2, 2024
@ton31337 ton31337 added this to the 10.2 milestone Oct 2, 2024
@piotrsuchy
Copy link
Contributor Author

(venv) root@localhost:~/frr# sudo -E python3 -m pytest tests/topotests/bgp_l3vpn_to_bgp_vrf/test_bgp_l3vpn_to_bgp_vrf.py::test_notification_check
========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.11.2, pytest-7.2.1, pluggy-1.0.0+repack
rootdir: /root/frr/tests/topotests, configfile: pytest.ini
collected 1 item

tests/topotests/bgp_l3vpn_to_bgp_vrf/test_bgp_l3vpn_to_bgp_vrf.py .                                                                                                                                [100%]

============================================================================================ warnings summary ============================================================================================
../../usr/lib/python3/dist-packages/_pytest/config/__init__.py:1294
  /usr/lib/python3/dist-packages/_pytest/config/__init__.py:1294: PytestConfigWarning: Unknown config option: asyncio_default_fixture_loop_scope

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

../../usr/lib/python3/dist-packages/_pytest/config/__init__.py:1294
  /usr/lib/python3/dist-packages/_pytest/config/__init__.py:1294: PytestConfigWarning: Unknown config option: asyncio_mode

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------------------------------------------------------------------------- generated xml file: /tmp/topotests/topotests.xml ----------------------------------------------------------------------------
===================================================================================== 1 passed, 2 warnings in 10.42s =====================================================================================
(venv) root@localhost:~/frr# vtysh -c 'show ver'
FRRouting 10.2-dev (localhost) on Linux(6.1.0-25-amd64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--disable-rpki' '--disable-scripting' '--enable-pim6d' '--disable-grpc' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
(venv) root@localhost:~/frr# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

The only test that failed is passing on my debian 12 setup - could I get a CI rerun? Thanks @ton31337

@piotrsuchy piotrsuchy changed the title lib: fix for incorrect nexthop lookup for bgp learnt-routes lib: fix for incorrect nexthop lookup for recursive routes Oct 3, 2024
@piotrsuchy
Copy link
Contributor Author

piotrsuchy commented Oct 9, 2024

To answer earlier questions - for a debug session, I attached to zebra and created a breakpoint like this:

break nexthop_same if nh1->resolved != 0 && nh2->resolved != 0

Starting state of the environment:

  • both host1 and host3 have kernel route 2300:5800::1 via fe80::2 dev tap1 (for host1) via tap3 (for host3) (added first in host1 and then in host3)
  • host2 has:
Every 2.0s: vtysh -c "show ip route 22.22.22.22"                                                                                host2: Wed Oct  9 03:07:27 2024

Routing entry for 22.22.22.22/32
  Known via "bgp", distance 20, metric 0, best
  Last update 00:00:58 ago
    2300:5800::1 (recursive), weight 1
  *   ::ffff:a28:1, via br3 onlink, weight 1
  *   ::ffff:a28:3, via br3 onlink, weight 1
    2300:5800::1 (duplicate nexthop removed) (recursive), weight 1
      ::ffff:a28:1, via br3 (duplicate nexthop removed) onlink, weight 1
      ::ffff:a28:3, via br3 (duplicate nexthop removed) onlink, weight 1
    2300:5800::1 (duplicate nexthop removed) (recursive), weight 1
      ::ffff:a28:1, via br3 (duplicate nexthop removed) onlink, weight 1
      ::ffff:a28:3, via br3 (duplicate nexthop removed) onlink, weight 1

And

Every 2.0s: vtysh -c "show ip route 2300:5800::1"                                                                              host2: Wed Oct  9 03:07:27 2024

Routing entry for 2300:5800::1/128
  Known via "bgp", distance 20, metric 1024, best
  Last update 00:00:59 ago
  * ::ffff:a28:1, via br3 onlink, weight 1
    ::ffff:a28:1, via br3 (duplicate nexthop removed) onlink, weight 1
    ::ffff:a28:1, via br3 (duplicate nexthop removed) onlink, weight 1
  * ::ffff:a28:3, via br3 onlink, weight 1
    ::ffff:a28:3, via br3 (duplicate nexthop removed) onlink, weight 1
    ::ffff:a28:3, via br3 (duplicate nexthop removed) onlink, weight 1
  • continued zebra and removed a kernel route from host1

Run continue until I run into a nexthop_same function call where, nh1->resolved had an ipv6 of ::ffff:a28:1 and nh2->resolved had an ipv6 of ::ffff:a28:3.

NH1 - 2300:5800::1 via (resolved) ::ffff:10:40:0:1
NH2 - 2300:5800::1 via (resolved) ::ffff:10:40:0:3

(gdb) print *nh1->resolved
$16 = {next = 0x0, prev = 0x0, vrf_id = 0, ifindex = 3, type = NEXTHOP_TYPE_IPV6_IFINDEX, flags = 11, {gate = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\377\377\n(\000\003", __u6_addr16 = {0, 0, 0, 0, 0, 65535, 10250, 768}, __u6_addr32 = {0, 0, 4294901760,
            50341898}}}}, bh_type = BLACKHOLE_UNSPEC}, src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, rmap_src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>,
        __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, resolved = 0x0, rparent = 0x5594a3f5e160, nh_label_type = ZEBRA_LSP_NONE, nh_label = 0x0, weight = 1 '\001', backup_num = 0 '\000', backup_idx = "\000\000\000\000\000\000\000", nh_encap_type = 0, nh_encap = {vni = 0}, rmac = {
    octet = "\000\000\000\000\000"}, srte_color = 0, nh_srv6 = 0x0}
(gdb) print *nh2->resolved
$17 = {next = 0x5594a3f61260, prev = 0x0, vrf_id = 0, ifindex = 3, type = NEXTHOP_TYPE_IPV6_IFINDEX, flags = 11, {gate = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\377\377\n(\000\001", __u6_addr16 = {0, 0, 0, 0, 0, 65535, 10250, 256}, __u6_addr32 = {0, 0,
            4294901760, 16787466}}}}, bh_type = BLACKHOLE_UNSPEC}, src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, rmap_src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>,
        __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, resolved = 0x0, rparent = 0x5594a3f63980, nh_label_type = ZEBRA_LSP_NONE, nh_label = 0x0, weight = 1 '\001', backup_num = 0 '\000', backup_idx = "\000\000\000\000\000\000\000", nh_encap_type = 0, nh_encap = {vni = 0}, rmac = {
    octet = "\000\000\000\000\000"}, srte_color = 0, nh_srv6 = 0x0}
(gdb) print *nh1
$20 = {next = 0x5594a3f64ff0, prev = 0x0, vrf_id = 0, ifindex = 0, type = NEXTHOP_TYPE_IPV6, flags = 5, {gate = {ipv4 = {s_addr = 5767203}, ipv6 = {__in6_u = {__u6_addr8 = "#\000X", '\000' <repeats 12 times>, "\001", __u6_addr16 = {35, 88, 0, 0, 0, 0, 0, 256}, __u6_addr32 = {5767203, 0, 0, 16777216}}}},
    bh_type = 5767203}, src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, rmap_src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
        __u6_addr32 = {0, 0, 0, 0}}}}, resolved = 0x5594a3f64f50, rparent = 0x0, nh_label_type = ZEBRA_LSP_NONE, nh_label = 0x0, weight = 1 '\001', backup_num = 0 '\000', backup_idx = "\000\000\000\000\000\000\000", nh_encap_type = 0, nh_encap = {vni = 0}, rmac = {octet = "\000\000\000\000\000"}, srte_color = 0,
  nh_srv6 = 0x0}
(gdb) print *nh2
$21 = {next = 0x5594a3f61300, prev = 0x0, vrf_id = 0, ifindex = 0, type = NEXTHOP_TYPE_IPV6, flags = 5, {gate = {ipv4 = {s_addr = 5767203}, ipv6 = {__in6_u = {__u6_addr8 = "#\000X", '\000' <repeats 12 times>, "\001", __u6_addr16 = {35, 88, 0, 0, 0, 0, 0, 256}, __u6_addr32 = {5767203, 0, 0, 16777216}}}},
    bh_type = 5767203}, src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, rmap_src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
        __u6_addr32 = {0, 0, 0, 0}}}}, resolved = 0x5594a3f63a20, rparent = 0x0, nh_label_type = ZEBRA_LSP_NONE, nh_label = 0x0, weight = 1 '\001', backup_num = 0 '\000', backup_idx = "\000\000\000\000\000\000\000", nh_encap_type = 0, nh_encap = {vni = 0}, rmac = {octet = "\000\000\000\000\000"}, srte_color = 0,
  nh_srv6 = 0x0}
(gdb) finish
Run till exit from #0  nexthop_same (nh1=nh1@entry=0x5594a3f5e160, nh2=nh2@entry=0x5594a3f63980) at ../lib/nexthop.c:431
0x00005594a1bc76eb in zebra_nhg_nexthop_compare (nhop=0x5594a3f5e160, old_nhop=0x5594a3f63980, rn=rn@entry=0x5594a3f5c900) at ../zebra/zebra_nhg.c:2954
2954    ../zebra/zebra_nhg.c: No such file or directory.
Value returned is $22 = true

nexthop_same returns true for those two.

nexthop_same(nh1->resolved, nh2->resolved) returns false:
nh1->resolved = 0x5594a3f64f50
nh2->resolved = 0x5594a3f63a20

(gdb) finish
Run till exit from #0  nexthop_same (nh1=0x5594a3f64f50, nh2=0x5594a3f63a20) at ../lib/nexthop.c:431
nexthop_same (nh1=nh1@entry=0x5594a3f5e160, nh2=nh2@entry=0x5594a3f63980) at ../lib/nexthop.c:431
431     in ../lib/nexthop.c
Value returned is $19 = false

That was the basic idea behind the patch, but after some more digging I found another way to provoke this bug - this reproduction makes it seem that nexthop caching for a recursive route is incorrect. Video: https://youtube.com/watch?v=GY57SUDj-z4

(Edit after a force push - as asked in the review commands, comparison of resolved nexthops was moved up to nhg_compare_nexthops, which is only called in 2 places, instead of nexthop_same which is called in way more places). Please take a look if this approach now is acceptable. This solves our issue, seemingly without breaking anything else.

piotrsuchy added a commit to piotrsuchy/frr that referenced this pull request Oct 10, 2024
For cases where the real (resolved) nexthop changes for a recursive route, FRR fails to update second level routes depending on this route.
That is because the zebra_nhg_hash_equal() function uses nhg_compare_nexthops(), which returns true for nexthops that are the same, but have
different resolved nexthops. This commit changes that comparison only if both nexthops have a resolved nexthop when comparing.
Example of a reproduction described in PR: FRRouting#16970.

Signed-off-by: Piotr Suchy <psuchy@akamai.com>
@piotrsuchy piotrsuchy force-pushed the incorrect_recursive_nexthop branch from 3be7e2c to e9db249 Compare October 10, 2024 08:56
@frrbot frrbot bot added the zebra label Oct 10, 2024
@github-actions github-actions bot added size/S and removed size/XS labels Oct 10, 2024
piotrsuchy added a commit to piotrsuchy/frr that referenced this pull request Oct 10, 2024
For cases where the real (resolved) nexthop changes for a recursive route, FRR fails to update second level routes depending on this route.
That is because the zebra_nhg_hash_equal() function uses nhg_compare_nexthops(), which returns true for nexthops that are the same, but have
different resolved nexthops. This commit changes that comparison only if both nexthops have a resolved nexthop when comparing.
Example of a reproduction described in PR: FRRouting#16970.

Signed-off-by: Piotr Suchy <psuchy@akamai.com>
@piotrsuchy piotrsuchy force-pushed the incorrect_recursive_nexthop branch from e9db249 to c210c5e Compare October 10, 2024 09:00
@riw777 riw777 self-requested a review October 15, 2024 16:59
Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@riw777 riw777 self-requested a review October 22, 2024 15:45
For cases where the resolved nexthop changes for a recursive route, FRR fails to update second level routes depending on this route.
That is because the zebra_nhg_hash_equal() function uses nhg_compare_nexthops(), which returns true for nexthops that are the same, but have
different resolved nexthops. This commit changes that comparison only if both nexthops have a resolved nexthop when comparing.
Example of a reproduction described in PR: FRRouting#16970.

Signed-off-by: Piotr Suchy <psuchy@akamai.com>
@piotrsuchy piotrsuchy force-pushed the incorrect_recursive_nexthop branch from c210c5e to 44ba2c9 Compare October 22, 2024 17:27
@github-actions github-actions bot added the rebase PR needs rebase label Oct 22, 2024
@piotrsuchy
Copy link
Contributor Author

CI:rerun

* is hashed to the same group as a new one, because this function returns true,
* even though their resolved nexthop is not the same.
*/
if (nh1->resolved && nh2->resolved)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 'same' is 'false' at this point, let's not do extra work?

* even though their resolved nexthop is not the same.
*/
if (nh1->resolved && nh2->resolved)
return same && nexthop_same(nh1->resolved, nh2->resolved);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will only compare one of the nexthops in 'resolved' - which may be a list?


return true;
/*
* In case of both nexthops having a resolved nexthop, check if both those
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not clear about this change.
This function is used to compare the nexthops that are forming an nhg - the list of nexthops in a route via zapi, or a daemon-owned nhg (also via zapi). this is different from the complete walk of the fully-resolved set of nexthops used in nexthop_group_hash(), for example.

@ton31337 ton31337 removed this from the 10.2 milestone Nov 11, 2024
@riw777
Copy link
Member

riw777 commented Jan 7, 2025

Where are we on this one? Anyone still working on it?

@piotrsuchy
Copy link
Contributor Author

Yes we will come back to this shortly, had other issues and couldn't properly respond here - sorry!

@piotrsuchy
Copy link
Contributor Author

It seems that @eqvinox has gone more in depth in the exact reason this issue occurs here:

#17901

and it might be that our PR won't be needed. Still letting it stay open for now, if that's not a problem, as it might be helpful for someone.

@piotrsuchy
Copy link
Contributor Author

Combination of #17935 and #17901 fixes this :) Closing this PR. Thanks!

@piotrsuchy piotrsuchy closed this Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants