Skip to content

cleanup(modern_bpf): r/w ro ebpf maps best practices #2399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 13, 2025

Conversation

LucaGuerra
Copy link
Contributor

@LucaGuerra LucaGuerra commented May 9, 2025

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind cleanup

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area driver-modern-bpf

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

Our ebpf probe is saving many things, including the interesting syscall set, into the .bss section which is set by libbpf as mmapable. It is a best practice not to include security sensitive settings in an mmapable fd, so we are separating these, plus the general ebpf probe settings into their own maps; in addition, we are setting variables that are never written to read-only (by using .rodata).

Credits to @mouadk for looking at the maps permissions and suggesting this improvement!

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

cleanup(modern_bpf): r/w ro ebpf maps best practices

…all_table` and `g_ia32_to_64_table` to rodata.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>

Co-authored-by: Luca Guerra <luca@guerra.sh>
Copy link

github-actions bot commented May 9, 2025

Please double check driver/API_VERSION file. See versioning.

/hold

@LucaGuerra
Copy link
Contributor Author

@FedeDP : if you agree, I would propose this for version 0.21.0, WDYT?

@poiana poiana added the approved label May 9, 2025
@poiana poiana requested review from FedeDP and hbrueckner May 9, 2025 10:39
@LucaGuerra LucaGuerra changed the title clenaup(modern_bpf): r/w ro ebpf maps best practices cleanup(modern_bpf): r/w ro ebpf maps best practices May 9, 2025
LucaGuerra and others added 2 commits May 9, 2025 10:43
…s table

Signed-off-by: Luca Guerra <luca@guerra.sh>
Co-authored-by: Kondah Mouad <kondah.mouad@gmail.com>
Signed-off-by: Luca Guerra <luca@guerra.sh>
@LucaGuerra LucaGuerra force-pushed the cleanup/modern-bpf-maps branch from a366d3f to a8c9ae0 Compare May 9, 2025 10:43
Copy link

github-actions bot commented May 9, 2025

Perf diff from master - unit tests

     3.49%     +0.49%  [.] next_event_from_file
     6.68%     -0.45%  [.] sinsp::next
     2.07%     -0.34%  [.] sinsp_parser::process_event
     2.26%     +0.30%  [.] gzfile_read
     5.03%     +0.28%  [.] sinsp_evt::get_type
     0.62%     +0.19%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::find
     4.98%     -0.17%  [.] sinsp_parser::reset
     0.88%     -0.16%  [.] sinsp::fetch_next_event
    34.79%     -0.12%  [.] sinsp_thread_manager::create_thread_dependencies
     4.06%     +0.11%  [.] sinsp_thread_manager::find_thread

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            -0.0052         -0.0051           149           149           149           149
BM_sinsp_split_median                                          -0.0043         -0.0042           149           149           149           149
BM_sinsp_split_stddev                                          -0.0311         -0.0312             1             1             1             1
BM_sinsp_split_cv                                              -0.0260         -0.0262             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.0741         +0.0743            56            60            56            60
BM_sinsp_concatenate_paths_relative_path_median                +0.0733         +0.0734            56            60            56            60
BM_sinsp_concatenate_paths_relative_path_stddev                -0.0487         -0.0456             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.1143         -0.1116             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0030         -0.0029            24            24            24            24
BM_sinsp_concatenate_paths_empty_path_median                   -0.0035         -0.0034            24            24            24            24
BM_sinsp_concatenate_paths_empty_path_stddev                   +0.3111         +0.3153             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       +0.3150         +0.3190             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.1398         +0.1400            54            62            54            62
BM_sinsp_concatenate_paths_absolute_path_median                +0.1490         +0.1492            54            62            54            62
BM_sinsp_concatenate_paths_absolute_path_stddev                -0.5961         -0.5962             2             1             2             1
BM_sinsp_concatenate_paths_absolute_path_cv                    -0.6456         -0.6458             0             0             0             0

Copy link

codecov bot commented May 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.26%. Comparing base (c0b1aea) to head (b4ac3bb).
Report is 6 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2399   +/-   ##
=======================================
  Coverage   77.26%   77.26%           
=======================================
  Files         227      227           
  Lines       30323    30323           
  Branches     4644     4644           
=======================================
  Hits        23429    23429           
  Misses       6894     6894           
Flag Coverage Δ
libsinsp 77.26% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LucaGuerra
Copy link
Contributor Author

/milestone 0.21.0

@poiana poiana added this to the 0.21.0 milestone May 9, 2025
@FedeDP
Copy link
Contributor

FedeDP commented May 9, 2025

Sure Luca!
/milestone 8.1.0+driver

@poiana poiana modified the milestones: 0.21.0, 8.1.0+driver May 9, 2025
@FedeDP
Copy link
Contributor

FedeDP commented May 9, 2025

Will run kernel-testing framework against this PR and post the results! 🙏

@FedeDP
Copy link
Contributor

FedeDP commented May 9, 2025

Also would love a review by @Andreagit97 ❤️

@FedeDP
Copy link
Contributor

FedeDP commented May 9, 2025

Everything good!
ARM64:

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

X64:

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-4.19 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-4.15 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes! As i wrote in one comment the only downside that I could see is a possible decrease in perf. We should probably keep an eye on possible perf regression after we merge this

Comment on lines 117 to 118
struct capture_settings settings;
pman_get_capture_settings(&settings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would return here if we are not able to get the settings. There is the risk we will write junk at the next update

Suggested change
struct capture_settings settings;
pman_get_capture_settings(&settings);
struct capture_settings settings = { 0 };
if(pman_get_capture_settings(&settings)!=0){return;}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done!

continue;
}
}
}

int pman_init_settings_map() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like pman_update_capture_settings with an initial struct capture_settings settings = {}; maybe we can avoid duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, implemented your suggestion

static __always_inline bool maps__64bit_interesting_syscall(uint32_t syscall_id) {
return g_64bit_interesting_syscalls_table[syscall_id & (SYSCALL_TABLE_SIZE - 1)];
static __always_inline bool maps__interesting_syscall_64bit(uint32_t syscall_id) {
bool *ret = bpf_map_lookup_elem(&interesting_syscalls_table_64bit, &syscall_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main reason we chose a memory-mapped global variable initially. Calling a bpf helper many times (more than 20 Millions/s) could decrease perf. We are probably talking about a few ns, but under heavy loads, it could have a little impact

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as we get it, BPF_F_MMAPABLE flag mmaps the eBPF maps to userspace memory, increasing perf for userspace only; in eBPF, the read and writes still pass through the helpers.
Perhaps we are wrong though?
See:
https://docs.kernel.org/bpf/map_array.html

Since kernel 5.5, memory mapping may be enabled for BPF_MAP_TYPE_ARRAY by setting the flag BPF_F_MMAPABLE. The map definition is page-aligned and starts on the first page. Sufficient page-sized and page-aligned blocks of memory are allocated to store all array values, starting on the second page, which in some cases will result in over-allocation of memory. The benefit of using this is increased performance and ease of use since userspace programs would not be required to use helper functions to access and mutate data.

https://docs.ebpf.io/linux/map-type/BPF_MAP_TYPE_ARRAY/#bpf_f_mmapable

Setting this flag on a BPF_MAP_TYPE_ARRAY will allow userspace programs to mmap the array values into the userspace process, effectively making a shared memory region between eBPF programs and a userspace program.
This can significantly improve read and write performance since there is no syscall overhead to access the map.

https://patchwork.ozlabs.org/project/netdev/patch/20191117172806.2195367-4-andriin@fb.com/

Add ability to memory-map contents of BPF array map. This is extremely useful
for working with BPF global data from userspace programs. It allows to avoid
typical bpf_map_{lookup,update}_elem operations, improving both performance
and usability.

Copy link
Contributor Author

@LucaGuerra LucaGuerra May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FedeDP is right, that's what we figured during the analysis for it. The only doubt I had about this is some extra perf optimization that might happen because of how libbpf handles .bss, .rodata and .data with a special instruction: https://docs.ebpf.io/linux/map-type/BPF_MAP_TYPE_ARRAY/#global-data .

It is unclear whether that has performance impact, I agree that the easiest way is to keep an eye on perf after it's merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm if I recall it correctly, the global variables are accessed without helpers but just using direct memory access. I quickly tried it

char LICENSE[] SEC("license") = "Dual BSD/GPL";

unsigned long stop = 0;

SEC("tp_btf/sys_enter")
int global(void *ctx) {
	stop++;
	return 0;
}

struct {
	__uint(type, BPF_MAP_TYPE_ARRAY);
	__uint(max_entries, 1);
	__type(key, uint32_t);
	__type(value, unsigned long);
} stop_map SEC(".maps");

SEC("tp_btf/sys_enter")
int map_array(void *ctx) {
	uint32_t key = 0;
	unsigned long *value = bpf_map_lookup_elem(&stop_map, &key);
	*value++;
	return 0;
}

Globals:

0: R1=ctx() R10=fp0
; stop++; @ globals.bpf.c:12
0: (18) r1 = 0xffffbfb7c1226000       ; R1_w=map_value(map=globals_.bss,ks=4,vs=8)
2: (79) r2 = *(u64 *)(r1 +0)          ; R1_w=map_value(map=globals_.bss,ks=4,vs=8) R2_w=scalar()
3: (07) r2 += 1                       ; R2_w=scalar()
4: (7b) *(u64 *)(r1 +0) = r2          ; R1_w=map_value(map=globals_.bss,ks=4,vs=8) R2_w=scalar()
; return 0; @ globals.bpf.c:13
5: (b7) r0 = 0                        ; R0_w=0
6: (95) exit
processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Map:

0: R1=ctx() R10=fp0
; int map_array(void *ctx) { @ globals.bpf.c:24
0: (b7) r1 = 0                        ; R1_w=0
; uint32_t key = 0; @ globals.bpf.c:25
1: (63) *(u32 *)(r10 -4) = r1
mark_precise: frame0: last_idx 1 first_idx 0 subseq_idx -1 
mark_precise: frame0: regs=r1 stack= before 0: (b7) r1 = 0
2: R1_w=0 R10=fp0 fp-8=0000????
2: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
;  @ globals.bpf.c:0
3: (07) r2 += -4                      ; R2_w=fp-4
; unsigned long *value = bpf_map_lookup_elem(&stop_map, &key); @ globals.bpf.c:26
4: (18) r1 = 0xffff9a8a4296b400       ; R1_w=map_ptr(map=stop_map,ks=4,vs=8)
6: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null(id=1,map=stop_map,ks=4,vs=8)
; return 0; @ globals.bpf.c:28
7: (b7) r0 = 0                        ; R0_w=0
8: (95) exit
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Probably looking at the bytecode of our probe before/after we should see the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for clarifying it! I guess you are right then, the emitted code is more optimized with the globals.

Copy link
Contributor Author

@LucaGuerra LucaGuerra May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Andreagit97 , makes sense, but fortunately the kernel is helping us by optimizing it away.

Indeed, the two compiled programs look different:

Using a map (i.e. this patch):

0000000000000090 <LBB0_26>:
      18:       63 7a f8 ff 00 00 00 00 *(u32 *)(r10 - 0x8) = r7
      19:       bf a2 00 00 00 00 00 00 r2 = r10
      20:       07 02 00 00 f8 ff ff ff r2 += -0x8
      21:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
      23:       85 00 00 00 01 00 00 00 call 0x1

Using .bss:

0000000000000090 <LBB0_26>:
      18:       bf 81 00 00 00 00 00 00 r1 = r8
      19:       57 01 00 00 ff 01 00 00 r1 &= 0x1ff
      20:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
      22:       0f 12 00 00 00 00 00 00 r2 += r1
      23:       71 22 00 00 00 00 00 00 r2 = *(u8 *)(r2 + 0x0)

In the compiled object file. There is obviously a call. However, at load time the kernel is able to optimize this since it's an (almost)fixed-size array map; this uses more instructions than the .bss version but I guess it's not as bad as an actual call, let's take a look:

# bpftool prog dump xlated name sys_enter opcodes
...
  17: (18) r1 = map[id:8834]
       18 11 00 00 82 22 00 00 00 00 00 00 00 00 00 00
  19: (07) r1 += 272
       07 01 00 00 10 01 00 00
  20: (61) r0 = *(u32 *)(r2 +0)
       61 20 00 00 00 00 00 00
  21: (35) if r0 >= 0x200 goto pc+3
       35 00 03 00 00 02 00 00
  22: (67) r0 <<= 3
       67 00 00 00 03 00 00 00
  23: (0f) r0 += r1
       0f 10 00 00 00 00 00 00
  24: (05) goto pc+1
       05 00 01 00 00 00 00 00
  25: (b7) r0 = 0
       b7 00 00 00 00 00 00 00
; if(ret == NULL) {
  26: (55) if r0 != 0x0 goto pc+30
       55 00 1e 00 00 00 00 00
...
; return *ret;
  57: (71) r1 = *(u8 *)(r0 +0)
       71 01 00 00 00 00 00 00

Originally we had 5 instructions, in the map version we have 10 instructions at the eBPF level. Not great but acceptable I think, also I'm not sure what happens at native code jit compile time or which kernel versions are able to perform this optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm, I see, I never noticed that! Yes, in this case, the overhead seems negligible

int pman_mark_single_64bit_syscall(int syscall_id, bool interesting) {
char error_message[MAX_ERROR_MESSAGE_LEN];
int fd = bpf_map__fd(g_state.skel->maps.interesting_syscalls_table_64bit);
if(bpf_map_update_elem(fd, &syscall_id, &interesting, BPF_ANY) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is called frequently it could have an impact as well but AFAIK we don't call it so frequently at runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly , this is not a problem (as far as i know, actually nobody calls this at runtime, only at engine init time. Also given we have ~500 ppm_sc codes, i don't expect people to call it more than like ~100 times for each batch).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no noticeable difference in startup times, and even if the client would dynamically set the interesting syscall set it wouldn't do it that frequently. I toyed with the idea of updating more elements in a single call but I don't think it's worth it now

Signed-off-by: Luca Guerra <luca@guerra.sh>
Co-authored-by: Andrea Terzolo <andreaterzolo3@gmail.com>
@LucaGuerra
Copy link
Contributor Author

Thanks for review @Andreagit97 , addressed your comments. On the performance side we should keep an eye on it but it seems like we're only adding a few instructions and no calls if the loader is smart enough

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented May 13, 2025

LGTM label has been added.

Git tree hash: 839a6b6a14983b18d261a90ddd6e60c1b31ef040

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@github-project-automation github-project-automation bot moved this from Todo to In progress in Falco Roadmap May 13, 2025
@poiana
Copy link
Contributor

poiana commented May 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, LucaGuerra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,FedeDP,LucaGuerra]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@FedeDP
Copy link
Contributor

FedeDP commented May 13, 2025

/unhold

@poiana poiana merged commit 9c2734a into falcosecurity:master May 13, 2025
57 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in Falco Roadmap May 13, 2025
@FedeDP FedeDP mentioned this pull request May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants