-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Context: #20425 (comment)
During ELF loading, the loader can encounter a tail call map with a different maxentries compared to the already-pinned tail call map for the endpoint, which leads it to recreate the map with the new size. Currently, when the agent starts up (or when a contributor changes some BPF .c during development and triggers an endpoint regenerate), there are 2 possible scenarios:
- The tail call map's properties (type, k, v, maxentries, flags) are different, so the map needs to be recreated. In this case: build ELF, load ELF from disk, see map properties have changed, move old map, create new map, pin new map, load all progs in the ELF (including entrypoint) into the kernel, put all prog fds into new tail call map (one by one..), atomically replace bpf entrypoint on the qdisc/xdp.
- Map properties are the same (not grown or shrunk), so the same pinned map is re-used. Build ELF, load ELF, open pinned map, load all progs into the kernel, put all prog fds into the pinned map one by one (which is still actively used by an existing qdisc/xdp), only then replace the entrypoint.
In the latter scenario, populating the tail call map is a sequential operation, not all prog array slots are replaced at once. Re-using a pinned tail call map causes an inconsistent view of the world while the new progs are being inserted into the existing map. If we move some logic from e.g. tail call 1 to tail call 11, packets are still handled while we're repopulating the tail call map. This could cause packets to be accepted or dropped erroneously.
I propose we remove the map migration concept entirely. Not only because it complicates the loader process, but also because the gains are negligible. The difference between both scenarios is the bpffs dir rename (and removal afterwards) and creation of the new tail call map, which consumes only a small amount of memory.