Skip to content

loadbalancer: Add LBMap pressure metrics #37881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 23, 2025

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Feb 26, 2025

This adds the BPF pressure gauges for all the BPF maps used in pkg/loadbalance/maps. The pressure metric is updated every 5 minutes through NextKeyBytes count. This is meant to be a temporary measure until we have a BPF iterator way of grabbing BPF map counts (#37927 explored this, follow up later). Since we already switched to the new load-balancing control-plane I want to make sure we have this implemented for v1.18 even if in a sub-optimal way.

The new load-balancing maps do not use pkg/bpf nor do they use the WithCache mechanism (no point since the map operation retries are handled by the reconciler (pkg/loadbalancer/reconciler). This means we can't use the (*bpf.Map).WithPressureMetric and instead need to do it this way.

The first two commits improve the BPFMapPressureGauge infra by removing the functions and moving this to be a method under metrics.Registry. This way we avoid the annoying metrics.registry promise that a lot of tests hit a timeout with, but of course since lot of the maps are still globals I had to snake through the *metrics.Registry to these. The WithPressureGauge will be a no-op with a nil registry so we can use that in tests or cilium-dbg.

@joamaki joamaki requested review from a team as code owners February 26, 2025 15:44
@joamaki joamaki requested review from derailed and aspsk February 26, 2025 15:44
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 26, 2025
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Feb 26, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 26, 2025
@joamaki
Copy link
Contributor Author

joamaki commented Feb 26, 2025

It's definitely not great to do full iteration over the BPF map, but at the same time I don't quite like the idea of using the non-BPF data sources for producing the metric, for one because doing so would be rather complicated and secondly it would be error prone. Do we feel like doing full key iteration every 5 minutes on few potentially large maps (1-100k keys) be acceptable or should other options be considered instead? Or should we do this say only once every 30 minutes?

Another alternative is to keep a count to the side in lbmaps.go that's initialized when the map is opened by iterating once and then updated by each successful operation.

Or we can update the metric from Prune() that also iterates over the maps.

I ended up with this approach as it's the most generic, simplest and least error prone of the options and my only worry is the potential cost of it.

@joamaki
Copy link
Contributor Author

joamaki commented Feb 26, 2025

Another potentially more efficient way to do this would be to do this on BPF side using a BPF program running on a timer and iterating over all maps of our BPF programs and writing counts to a hash map, from which user-space would push them to metrics. Not quite sure how complicated that would be.

@joestringer
Copy link
Member

This seems like a good use case for BPF iterators (Linux v5.8, https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_TRACING/#iterator), where the iterator itself is a BPF program that is run against each element that could then accumulate and send the final value across the userspace barrier just once.

@joestringer
Copy link
Member

If we're concerned about the runtime cost of this implementation, I think it would be OK to have a hidden flag to disable the counting so that impacted users could dig through and disable this if they want to make that trade off. Otherwise as long as we come up with a reasonable default implementation I'd be OK with this as-is. I do think BPF iterators would provide a better solution but I don't feel that this would be a requirement to make LBMap pressure metrics available (don't want to increase the scope of this PR too much).

@aspsk
Copy link
Contributor

aspsk commented Feb 27, 2025

where the iterator itself is a BPF program that is run against each element that could then accumulate and send the final value across the userspace barrier just once

And we also have bpf_map_sum_elem_count kfunc, so it's one kfunc call (which sums per-cpu value) per map. Right now the way to use it is to iterate through all maps. If a subset of maps is needed one can load an iterator + a hashmap of maps of interest to filter all. (If only a small number of maps out of the whole system is required, we might consider optimizing this by either following up on adding per-map sum_elem_cout kfunc (see this thread), or adding a pre-filtered map iterator type.)

@joamaki
Copy link
Contributor Author

joamaki commented Feb 28, 2025

where the iterator itself is a BPF program that is run against each element that could then accumulate and send the final value across the userspace barrier just once

And we also have bpf_map_sum_elem_count kfunc, so it's one kfunc call (which sums per-cpu value) per map. Right now the way to use it is to iterate through all maps. If a subset of maps is needed one can load an iterator + a hashmap of maps of interest to filter all. (If only a small number of maps out of the whole system is required, we might consider optimizing this by either following up on adding per-map sum_elem_cout kfunc (see this thread), or adding a pre-filtered map iterator type.)

Yeah this would be great! We'd get map pressure metrics for all of the BPF maps this way and not just for a somewhat random subset that we do now. Wonder what'd be a good fallback when we don't have this supported. I guess we could still do the key iteration from user-space in those cases but just do it less often.

@joamaki
Copy link
Contributor Author

joamaki commented Feb 28, 2025

Using the map_sum_elem_count seems promising: #37927

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 5b19bfc to 401f384 Compare March 25, 2025 11:53
@joamaki
Copy link
Contributor Author

joamaki commented Mar 25, 2025

I would like to suggest that we would go with this initially and then following up with the map_sum_elem_count approach later (that impacts other sub-systems and requires consensus, fallback implementations etc.).

This way I can soon switch the default LB control-plane to the new one without missing the pressure metrics.

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 401f384 to 84f82aa Compare March 25, 2025 15:25
@joamaki joamaki requested review from a team as code owners March 25, 2025 15:25
@joamaki joamaki requested a review from mhofstetter March 25, 2025 15:25
Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envoy ✔️

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 84f82aa to d848f71 Compare March 25, 2025 16:04
@joamaki
Copy link
Contributor Author

joamaki commented Mar 25, 2025

/test

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor notes but otherwise LGTM.

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from d848f71 to 83b2b1c Compare March 26, 2025 13:43
@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 83b2b1c to 54755b8 Compare March 26, 2025 15:01
@joamaki
Copy link
Contributor Author

joamaki commented Mar 26, 2025

/test

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 401c695 to b002b7e Compare May 9, 2025 13:55
@joamaki
Copy link
Contributor Author

joamaki commented May 9, 2025

/test

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch 2 times, most recently from cdc832e to 30db196 Compare May 14, 2025 15:52
@joamaki
Copy link
Contributor Author

joamaki commented May 14, 2025

/test

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch 2 times, most recently from 71e6da5 to 58fbb97 Compare May 19, 2025 08:25
@joamaki joamaki requested a review from a team as a code owner May 19, 2025 08:25
@joamaki joamaki requested a review from doniacld May 19, 2025 08:25
@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 58fbb97 to f2b6065 Compare May 19, 2025 08:41
@joamaki
Copy link
Contributor Author

joamaki commented May 19, 2025

/test

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from f2b6065 to 056a91a Compare May 19, 2025 10:43
@joamaki
Copy link
Contributor Author

joamaki commented May 19, 2025

/test

@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 056a91a to 01eba87 Compare May 19, 2025 16:35
@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 01eba87 to 9a951f4 Compare May 20, 2025 08:14
@joamaki
Copy link
Contributor Author

joamaki commented May 20, 2025

/test

1 similar comment
@joamaki
Copy link
Contributor Author

joamaki commented May 20, 2025

/test

joamaki added 4 commits May 22, 2025 13:31
This allows registering pressure metrics without going through the registry
global variable.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This allows removing [metrics.NewBPFMapPressureGauge] and reduces
the uses we have of the [metrics.registry] promise. This will also
speed up tests as we won't have tests timing out trying to grab the
registry via the promise on map pressure updates.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
BatchCount() uses BatchLookup to count up the number of elements in the BPF map.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Periodically (every 5 minutes) update the BPF map pressure metrics for all opened LB maps.
This is on by default but can be disabled with '--lb-pressure-metrics-interval=0' (in
helm can be set via 'extraConfig').

This is somewhat different to the "WithCache()" way of doing pressure metrics
as we don't have an easy way of counting how many entries should be in the BPF
map (doable, but then we'd need to match the logic in bpf_reconciler.go exactly).

As a more foolproof way we instead get the number of keys in each BPF map
and report the ratio of that to MaxEntries. This of course won't go beyond
1.0.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/exp-lb-pressure-metrics branch from 9a951f4 to 1b75396 Compare May 22, 2025 12:23
@joamaki
Copy link
Contributor Author

joamaki commented May 22, 2025

/test

@joamaki joamaki added this pull request to the merge queue May 23, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 23, 2025
Merged via the queue into cilium:main with commit 3bb3336 May 23, 2025
66 of 67 checks passed
@joamaki joamaki deleted the pr/joamaki/exp-lb-pressure-metrics branch May 23, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.