Skip to content

Use-After-Free in BGP ORF usage of prefix lists #18138

@eqvinox

Description

@eqvinox

We're seeing an intermittent crash in a BGP ORF test:

https://ci1.netdef.org/artifact/TESTING-POTATOUVMPARA/POTATODEB12ARM8/build-1197/Topotato-Report/test_bgp_orf_py__TestBGPORF.html#i4,

Received signal 11 at 1739400735 (si_addr 0x72646461227b52); aborting...	
zlog_signal+0x108                  ffffb7acbc70     ffffea6f9920 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)	
core_handler+0xb4                  ffffb7b2b4bc     ffffea6f9a40 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)	
    ---- signal ----	
?                                  ffffb7cd183c     ffffea6f9b90 linux-vdso.so.1 (mapped at 0xffffb7cd1000)	
prefix_list_apply_ext+0x9c         ffffb7b064dc     ffffea6fadf0 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)	
subgroup_announce_check+0x8bc      aaaab3490a34     ffffea6fae60 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
subgroup_process_announce_selected+0x154     aaaab3493210     ffffea6fb160 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
subgroup_announce_table+0x224      aaaab34d5c00     ffffea6fb2f0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
subgroup_announce_route+0xec       aaaab34d5d7c     ffffea6fb340 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
peer_af_announce_route+0x258       aaaab34d344c     ffffea6fb380 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
bgp_announce_route_timer_expired+0x84     aaaab34995d8     ffffea6fb3c0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
event_call+0x13c                   ffffb7b45c34     ffffea6fb3f0 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)	
frr_run+0x27c                      ffffb7abf16c     ffffea6fb4e0 /home/ci/cibuild.1197/frr-source/lib/.libs/libfrr.so.0 (mapped at 0xffffb79e0000)	
main+0x62c                         aaaab33dd788     ffffea6fb5f0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
__libc_init_first+0x80             ffffb7727740     ffffea6fb670 /lib/aarch64-linux-gnu/libc.so.6 (mapped at 0xffffb7700000)	
__libc_start_main+0x98             ffffb7727818     ffffea6fb780 /lib/aarch64-linux-gnu/libc.so.6 (mapped at 0xffffb7700000)	
_start+0x30                        aaaab33db930     ffffea6fb7e0 /home/ci/cibuild.1197/frr-source/bgpd/.libs/bgpd (mapped at 0xaaaab32e0000)	
in thread bgp_announce_route_timer_expired scheduled from bgpd/bgp_route.c:5932 bgp_announce_route()	

Ancillary information:

  • this is quite rare (less than 1 in 20 runs)
  • only seen on ARM8 so far now seen on AMD64
  • more commonly on Ubuntu 24.04 than on Debian 12 (though seen on both, might just be random)
  • only seen under high load on the test system with parallel test runs
  • the crash itself seems to be variadic in the sense that it happens roughly in this point of sequence, but not necessarily in the same code location seems to be consistent after all (might've been poor reporting)

This combination of "factlets" is suggestive of a memory synchronization race condition or general memory corruption. ARM has a weaker memory consistency model than x86 TSO. However, it is not immediately obvious to me how the various bgpd threads would be involved here to result in this; as I can see it most things should be happening on the main thread really.

There is unfortunately no coredump available at this point, I'm working on getting one. I'm also trying to get it to crash under valgrind. Getting it reproduced outside of CI has so far failed.

This issue is opened to get more eyes on this; crashing bgpd is a pretty bad bug in any case. It's at minimum an availability/denial of service security issue if bgpd can be made to crash, worst case if it's memory corruption it may be exploitable.

The test is modeled after bgp_orf in topotests, which should in theory show the same issue. I have not attempted to reproduce it on that yet. The topotato test source can be found here: https://github.com/opensourcerouting/topotato/blob/master/test_bgp_orf.py. Feedback on topotato is not invited or welcome on this specific issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions