Skip to content

Conversation

chdxD1
Copy link
Contributor

@chdxD1 chdxD1 commented Mar 4, 2025

Today zebra tracks EVPN vtep_ips in the L3-VNI of the route_entry. Leaking routes in bgpd and using different RMACs on the remote side for different L3-VNIs for a single VTEP leads to updates in the leak destination VRF which is not desired. Zebra gets the source VRF in vrf_id of the nexthops. This nexthop vrf_id is now used for adding/ updating/deleting the l3vni nexthop.

Should fix #18190

@donaldsharp
Copy link
Member

Can we get a topotest to show that this is fixed?

@chdxD1
Copy link
Contributor Author

chdxD1 commented Mar 4, 2025

Can we get a topotest to show that this is fixed?

Sure, will add one!

@github-actions github-actions bot added size/XL and removed size/XS labels Mar 5, 2025
@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from d3b7140 to 1fa0bb6 Compare March 5, 2025 16:13
@chdxD1
Copy link
Contributor Author

chdxD1 commented Mar 5, 2025

Okay, added the topotest now. I created a new topology for it as route-leaking (as far as I am aware at least) only applies to vrf-lite, not to the netns vrf backend (which the bgp_evpn_rt5 topology relies on).

If I revert f5dd2e6 and therefore run it without my changes, the topotest fails because the router MAC is wrongly updated:

2025-03-05 16:03:12,785 INFO: topo: RMACs on r1 (received from r2): {'101': 'be:51:0d:9d:1f:6a', '102': 'be:51:0d:9d:1f:6a'}
2025-03-05 16:03:12,785 INFO: topo: Local VRF RMACs on r2: {'101': '06:a4:01:4b:db:36', '102': 'be:51:0d:9d:1f:6a'}
2025-03-05 16:03:12,836 ERROR: topo: test failed at "bgp_evpn_rt5_leaking.test_bgp_evpn_leaking/test_router_rmac_after_leak": RMACs mismatch between r1 and r2
assert Generated JSON diff error report:
  
  > $->101: output has element with value 'be:51:0d:9d:1f:6a' but in expected it has value '06:a4:01:4b:db:36'
2025-03-05 16:03:12,836 INFO: topo: setting error msg: bgp_evpn_rt5_leaking.test_bgp_evpn_leaking/test_router_rmac_after_leak
FAILED

@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from 1fa0bb6 to 713510d Compare March 5, 2025 16:32
@chdxD1
Copy link
Contributor Author

chdxD1 commented Mar 5, 2025

I've slightly changed the topotest, it didn't catch all failure cases (it only looked at the first RMAC in the list, not at the RMAC count in the list). If the first one in the list was the right one it did not properly return a failure. With the code previous to this PR it now fails with:

2025-03-05 16:29:33,892 ERROR: topo: test failed at "bgp_evpn_rt5_leaking.test_bgp_evpn_leaking/test_router_rmac_after_additional_network": RMACs should have only 2 keys: numRmacs and a single RMAC: dict_keys(['numRmacs', '4e:10:c8:b6:b1:d4', '02:86:13:da:a1:02'])
assert 3 == 2
 +  where 3 = len(dict_keys(['numRmacs', '4e:10:c8:b6:b1:d4', '02:86:13:da:a1:02']))
 +    where dict_keys(['numRmacs', '4e:10:c8:b6:b1:d4', '02:86:13:da:a1:02']) = <built-in method keys of dict object at 0x7ffffe29a2c0>()
 +      where <built-in method keys of dict object at 0x7ffffe29a2c0> = {'numRmacs': 2, '4e:10:c8:b6:b1:d4': {'routerMac': '4e:10:c8:b6:b1:d4', 'vtepIp': '192.168.100.41'}, '02:86:13:da:a1:02': {'routerMac': '02:86:13:da:a1:02', 'vtepIp': '192.168.100.41'}}.keys
2025-03-05 16:29:33,892 INFO: topo: setting error msg: bgp_evpn_rt5_leaking.test_bgp_evpn_leaking/test_router_rmac_after_additional_network
FAILED

@Jafaral
Copy link
Member

Jafaral commented Mar 20, 2025

style suggestions by frrbot.

@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from 713510d to 60950bb Compare March 20, 2025 08:28
@github-actions github-actions bot added the rebase PR needs rebase label Mar 20, 2025
@chdxD1
Copy link
Contributor Author

chdxD1 commented Mar 20, 2025

done!

@chdxD1
Copy link
Contributor Author

chdxD1 commented Mar 24, 2025

ci:rerun unrelated error

@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 22, 2025

Anything missing for this PR to be accepted?

@louis-6wind
Copy link
Contributor

Could you rebase ?

I get an errror

    def test_protocols_convergence():
        """
        Assert that all protocols have converged
        statuses as they depend on it.
        """
        tgen = get_topogen()
        if tgen.routers_have_failure():
            pytest.skip(tgen.errors)
        # Check BGP IPv4 routing tables on r1
        logger.info("Checking BGP L2VPN EVPN routes for convergence on r1")

        for rname in ("r1", "r2"):
            router = tgen.gears[rname]
            json_file = "{}/{}/bgp_l2vpn_evpn_routes.json".format(CWD, router.name)
            if not os.path.isfile(json_file):
                assert 0, "bgp_l2vpn_evpn_routes.json file not found"
    
            expected = json.loads(open(json_file).read())
            test_func = partial(
                topotest.router_json_cmp,
                router,
                "show bgp l2vpn evpn json",
                expected,
            )
            _, result = topotest.run_and_expect(test_func, None, count=20, wait=1)
            assertmsg = '"{}" JSON output mismatches'.format(router.name)
>           assert result is None, assertmsg
E           AssertionError: "r1" JSON output mismatches
E           assert Generated JSON diff error report:
E             
E             > $->bgpTableVersion: output has element with value '4' but in expected it has value '2'

@louis-6wind louis-6wind self-requested a review April 23, 2025 14:17
Copy link
Contributor

@louis-6wind louis-6wind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase.

The test was working before the merge of 0c14dd3

@louis-6wind
Copy link
Contributor

louis-6wind commented Apr 23, 2025

Could you take inspiration from #18708 to clarify the topotest ?

@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from 901b0f1 to f74c838 Compare April 23, 2025 14:35
@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 23, 2025

Please rebase.

The test was working before the merge of 0c14dd3

Could you take inspiration from #18708 to clarify the topotest ?

@louis-6wind Yeah, will do.

@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 23, 2025

@louis-6wind as you are reducing complexity in bgp_evpn_rt5 quite a lot it probably makes sense to merge the tests anyway (#18358 is introducing import vrf into it anyway), instead of creating another topology for it..

@louis-6wind
Copy link
Contributor

@louis-6wind as you are reducing complexity in bgp_evpn_rt5 quite a lot it probably makes sense to merge the tests anyway (#18358 is introducing import vrf into it anyway), instead of creating another topology for it..

@pguibert6WIND has created a new topology because we need to test both situations: underlay in default VRF and in another VRF. We have to make sure that the new code does not make regression in working setup

@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 23, 2025

@louis-6wind as you are reducing complexity in bgp_evpn_rt5 quite a lot it probably makes sense to merge the tests anyway (#18358 is introducing import vrf into it anyway), instead of creating another topology for it..

@pguibert6WIND has created a new topology because we need to test both situations: underlay in default VRF and in another VRF. We have to make sure that the new code does not make regression in working setup

He just introduced r4 from what I can tell in bgp_evpn_rt5, I would introduce the bgp_evpn_rt5_leaking topology with this PR

@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 23, 2025

Or to say it differently: I think it does not make sense to introduce a new topology for it (as I do with this). I'll wait for #18708 to be merged and add route-leaking / vrf import in the bgp_evpn_rt5 topology based on your work.

Today zebra tracks EVPN vtep_ips in the L3-VNI of the route_entry.
Leaking routes in bgpd and using different RMACs on the remote side
for different L3-VNIs for a single VTEP leads to updates in the leak
destination VRF which is not desired. Zebra gets the source VRF in
vrf_id of the nexthops. This nexthop vrf_id is now used for adding/
updating/deleting the l3vni nexthop.

Signed-off-by: Christopher Dziomba <christopher.dziomba@telekom.de>
@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from f74c838 to 86e71c6 Compare April 27, 2025 10:24
@github-actions github-actions bot added size/XXL and removed size/XL labels Apr 27, 2025
@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 27, 2025

Okay, after carefully looking at #18358 again, it did not implement leaking for the usecase I needed, thanks for pointing that out again.

I added a second VRF to R2 and created another router, R3, because R1 is not suited for import vrf (using netns). After configuring some leaking on R2 and R3, routes are compared to the expected values and duplicate RMACs per VNI (that should not happen and would lead to broken programming of the Linux Kernel).

In general duplicate RMACs could still happen when routes with conflicting evpn information are advertised (same VTEP IP but with different RMAC), maybe changing log L3VNI %u RMAC change(%pEA --> %pEA) for nexthop %pIA, prefix %pFX from debug to info/warning might make sense. However this can also be triggered when changing the RMAC of a VTEP on purpose..

@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from 86e71c6 to e785d97 Compare April 27, 2025 10:43
@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 27, 2025

Forget about it, I came up with a better idea to not introduce R3, keeping the topology smaller:
VRF 101 and VRF 102 on R1 and R2, leaking R1 loopbacks on R2 between 101 and 102 and then running the ping from VRF 101 on R1 to VRF 102 on R1.

@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from e785d97 to 78abdb2 Compare April 27, 2025 14:17
@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 27, 2025

done!

@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch 2 times, most recently from 64195e4 to 93d8f32 Compare April 27, 2025 14:20
An additional VRF, 102, is introduced on both routers, importing
r1 routes from VRF 101 into VRF 102 on r2 and vice versa (both
on r2, because r1 is in netns mode and can not use route import).

RMACs and ping (from VRF 101 to VRF 102 on r1, going through r2) is
then checked.

All RMACs are created deterministically, using
52:54:00:00:<router>:<vrf> to ease debugging and checks.

Signed-off-by: Christopher Dziomba <christopher.dziomba@telekom.de>
@chdxD1 chdxD1 force-pushed the fix/zebra-evpn-vrf branch from 93d8f32 to fd975ad Compare April 27, 2025 14:20
@louis-6wind
Copy link
Contributor

louis-6wind commented Apr 28, 2025

There is a zebra crash.

=================================================================
==16235==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fe576972480 bp 0x7fe57419d310 sp 0x7fe57419d1f8 T2)
==16235==The signal is caused by a WRITE memory access.
==16235==Hint: address points to the zero page.
    #0 0x7fe576972480 in __memset_avx2_unaligned_erms ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:386
    #1 0x55e791905c60 in netlink_route_multipath_msg_encode zebra/rt_netlink.c:2460
    #2 0x55e79190715c in netlink_delroute_msg_encoder zebra/rt_netlink.c:3359
    #3 0x55e7918e6faf in netlink_batch_add_msg zebra/kernel_netlink.c:1516
    #4 0x55e79190931e in netlink_put_route_update_msg zebra/rt_netlink.c:3420
    #5 0x55e7918e6d1e in nl_put_msg zebra/kernel_netlink.c:1570
    #6 0x55e7918e6d1e in kernel_update_multi zebra/kernel_netlink.c:1687
    #7 0x55e79194dfc4 in kernel_dplane_process_func zebra/zebra_dplane.c:7046
    #8 0x55e79194d9a2 in dplane_thread_loop zebra/zebra_dplane.c:7532
    #9 0x7fe576ce896b in event_call lib/event.c:2019
    #10 0x7fe576bec395 in fpt_run lib/frr_pthread.c:369
    #11 0x7fe576beb812 in frr_pthread_inner lib/frr_pthread.c:179
    #12 0x7fe5768a81f4 in start_thread nptl/pthread_create.c:442
    #13 0x7fe57692889b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:386 in __memset_avx2_unaligned_erms
Thread T2 created by T0 here:
    #0 0x7fe577049726 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
    #1 0x7fe576bebeb3 in frr_pthread_run lib/frr_pthread.c:197
    #2 0x55e79194fb5b in zebra_dplane_start zebra/zebra_dplane.c:7777
    #3 0x55e7918ec0dc in main zebra/main.c:526
    #4 0x7fe576846249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

==16235==ABORTING

The crash was previously reported here:
#15748 (comment)

It seems that both netns contexts share the same info in memory for table 254 (main linux rt table), which is incorrect.
It seems that the issue was there before you change and that you triggered the issue.

@louis-6wind
Copy link
Contributor

It seems that both netns contexts share the same info in memory for table 254 (main linux rt table), which is incorrect.
It seems that the issue was there before you change and that you triggered the issue.

Not that. Let's dissect the backtrace.

=================================================================
==16235==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fe576972480 bp 0x7fe57419d310 sp 0x7fe57419d1f8 T2)
==16235==The signal is caused by a WRITE memory access.
==16235==Hint: address points to the zero page.
    #0 0x7fe576972480 in __memset_avx2_unaligned_erms ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:386
    #1 0x55e791905c60 in netlink_route_multipath_msg_encode zebra/rt_netlink.c:2460
    ssize_t netlink_route_multipath_msg_encode(int cmd, struct zebra_dplane_ctx *ctx,
					   uint8_t *data, size_t datalen,
					   bool fpm, bool force_nhg,
					   bool force_rr)

	struct {
		struct nlmsghdr n;
		struct rtmsg r;
		char buf[];
	} *req = (void *)data;

	memset(req, 0, sizeof(*req));

req is NULL because data is NULL

    #2 0x55e79190715c in netlink_delroute_msg_encoder zebra/rt_netlink.c:3359
static ssize_t netlink_delroute_msg_encoder(struct zebra_dplane_ctx *ctx,
					    void *buf, size_t buflen)
{
	return netlink_route_multipath_msg_encode(RTM_DELROUTE, ctx, buf,
						  buflen, false, false, false);
}

buf is NULL

    #3 0x55e7918e6faf in netlink_batch_add_msg zebra/kernel_netlink.c:1516
enum netlink_msg_status netlink_batch_add_msg(
	struct nl_batch *bth, struct zebra_dplane_ctx *ctx,
	ssize_t (*msg_encoder)(struct zebra_dplane_ctx *, void *, size_t),
	bool ignore_res)

	size = (*msg_encoder)(ctx, bth->buf_head, bth->bufsiz - bth->curlen);

bth->buf_head is NULL

    #4 0x55e79190931e in netlink_put_route_update_msg zebra/rt_netlink.c:3420
enum netlink_msg_status
netlink_put_route_update_msg(struct nl_batch *bth, struct zebra_dplane_ctx *ctx)

	return netlink_batch_add_msg(bth, ctx,
				     cmd == RTM_NEWROUTE
					     ? netlink_newroute_msg_encoder
					     : netlink_delroute_msg_encoder,
				     false);

bth->buf_head is NULL

    #5 0x55e7918e6d1e in nl_put_msg zebra/kernel_netlink.c:1570
static enum netlink_msg_status nl_put_msg(struct nl_batch *bth,
					  struct zebra_dplane_ctx *ctx)
{
	switch (dplane_ctx_get_op(ctx)) {

	case DPLANE_OP_ROUTE_INSTALL:
	case DPLANE_OP_ROUTE_UPDATE:
	case DPLANE_OP_ROUTE_DELETE:
		return netlink_put_route_update_msg(bth, ctx);

bth->buf_head is NULL

    #6 0x55e7918e6d1e in kernel_update_multi zebra/kernel_netlink.c:1687
void kernel_update_multi(struct dplane_ctx_list_head *ctx_list)
		res = nl_put_msg(&batch, ctx);

batch.buf_head is NULL. It was set by nl_batch_init() -> nl_batch_reset()

	nl_batch_init(&batch, &handled_list);
static void nl_batch_init(struct nl_batch *bth,
			  struct dplane_ctx_list_head *ctx_out_q)
{
	if (bufsize != nl_batch_tx_bufsize) {
		if (nl_batch_tx_buf)
			XFREE(MTYPE_NL_BUF, nl_batch_tx_buf);

		nl_batch_tx_buf = XCALLOC(MTYPE_NL_BUF, bufsize);
		nl_batch_tx_bufsize = bufsize;
	}
	bth->buf = nl_batch_tx_buf;
	nl_batch_reset(bth);
static void nl_batch_reset(struct nl_batch *bth)
	bth->buf_head = bth->buf;

It means that nl_batch_tx_buf is NULL

The only place it is free is kernel_terminate()

@chdxD1
Copy link
Contributor Author

chdxD1 commented Apr 28, 2025

@louis-6wind thank you very much! So, I see that kernel_terminate has an argument called complete. It is called, when a namespace disappears, however complete is also set to true, when a namespace is deleted, not only when zebra shuts down.

So I think this was introduced by c416603, freeing the global nl_batch_tx_buf instead of only freeing zebra_ns related memory, probably under the assumption that complete is only true when zebra shuts down and not a single ns is deleted.

@louis-6wind
Copy link
Contributor

So I think this was introduced by c416603, freeing the global nl_batch_tx_buf instead of only freeing zebra_ns related memory, probably under the assumption that complete is only true when zebra shuts down and not a single ns is deleted.

Yes, nl_batch_tx_buf should be deleted only on the last ns deletion

The nl batch buffer was destroyed too early when a netns was
terminated. Now freeing the buffer later in router_terminate allows
netlink messages to be still processed.

Signed-off-by: Christopher Dziomba <christopher.dziomba@telekom.de>
Copy link
Contributor

@louis-6wind louis-6wind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@pguibert6WIND pguibert6WIND left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pguibert6WIND pguibert6WIND merged commit 593ad76 into FRRouting:master Apr 29, 2025
13 checks passed
@leonshaw leonshaw mentioned this pull request May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EVPN - Incorrect L3VNI routermac address installed
5 participants