-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
We're seeing an intermittent crash in a BGP ORF test:
Received signal 11 at 1739400735 (si_addr 0x72646461227b52); aborting...
zlog_signal+0x108 ffffb7acbc70 ffffea6f9920 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)
core_handler+0xb4 ffffb7b2b4bc ffffea6f9a40 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)
---- signal ----
? ffffb7cd183c ffffea6f9b90 linux-vdso.so.1 (mapped at 0xffffb7cd1000)
prefix_list_apply_ext+0x9c ffffb7b064dc ffffea6fadf0 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)
subgroup_announce_check+0x8bc aaaab3490a34 ffffea6fae60 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
subgroup_process_announce_selected+0x154 aaaab3493210 ffffea6fb160 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
subgroup_announce_table+0x224 aaaab34d5c00 ffffea6fb2f0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
subgroup_announce_route+0xec aaaab34d5d7c ffffea6fb340 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
peer_af_announce_route+0x258 aaaab34d344c ffffea6fb380 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
bgp_announce_route_timer_expired+0x84 aaaab34995d8 ffffea6fb3c0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
event_call+0x13c ffffb7b45c34 ffffea6fb3f0 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)
frr_run+0x27c ffffb7abf16c ffffea6fb4e0 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)
main+0x62c aaaab33dd788 ffffea6fb5f0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
__libc_init_first+0x80 ffffb7727740 ffffea6fb670 /lib/aarch64-linux-gnu/libc.so.6 (mapped at 0xffffb7700000)
__libc_start_main+0x98 ffffb7727818 ffffea6fb780 /lib/aarch64-linux-gnu/libc.so.6 (mapped at 0xffffb7700000)
_start+0x30 aaaab33db930 ffffea6fb7e0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)
in thread bgp_announce_route_timer_expired scheduled from bgpd/bgp_route.c:5932 bgp_announce_route()
Ancillary information:
- this is quite rare (less than 1 in 20 runs)
only seen on ARM8 so farnow seen on AMD64- more commonly on Ubuntu 24.04 than on Debian 12 (though seen on both, might just be random)
- only seen under high load on the test system with parallel test runs
the crash itself seems to be variadic in the sense that it happens roughly in this point of sequence, but not necessarily in the same code locationseems to be consistent after all (might've been poor reporting)
This combination of "factlets" is suggestive of a memory synchronization race condition or general memory corruption. ARM has a weaker memory consistency model than x86 TSO. However, it is not immediately obvious to me how the various bgpd
threads would be involved here to result in this; as I can see it most things should be happening on the main thread really.
There is unfortunately no coredump available at this point, I'm working on getting one. I'm also trying to get it to crash under valgrind. Getting it reproduced outside of CI has so far failed.
This issue is opened to get more eyes on this; crashing bgpd
is a pretty bad bug in any case. It's at minimum an availability/denial of service security issue if bgpd
can be made to crash, worst case if it's memory corruption it may be exploitable.
The test is modeled after bgp_orf
in topotests, which should in theory show the same issue. I have not attempted to reproduce it on that yet. The topotato test source can be found here: https://github.com/opensourcerouting/topotato/blob/master/test_bgp_orf.py. Feedback on topotato is not invited or welcome on this specific issue.