-
Notifications
You must be signed in to change notification settings - Fork 3.4k
loadbalancer: Add LBMap pressure metrics #37881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadbalancer: Add LBMap pressure metrics #37881
Conversation
It's definitely not great to do full iteration over the BPF map, but at the same time I don't quite like the idea of using the non-BPF data sources for producing the metric, for one because doing so would be rather complicated and secondly it would be error prone. Do we feel like doing full key iteration every 5 minutes on few potentially large maps (1-100k keys) be acceptable or should other options be considered instead? Or should we do this say only once every 30 minutes? Another alternative is to keep a count to the side in Or we can update the metric from I ended up with this approach as it's the most generic, simplest and least error prone of the options and my only worry is the potential cost of it. |
Another potentially more efficient way to do this would be to do this on BPF side using a BPF program running on a timer and iterating over all maps of our BPF programs and writing counts to a hash map, from which user-space would push them to metrics. Not quite sure how complicated that would be. |
This seems like a good use case for BPF iterators (Linux v5.8, https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_TRACING/#iterator), where the iterator itself is a BPF program that is run against each element that could then accumulate and send the final value across the userspace barrier just once. |
If we're concerned about the runtime cost of this implementation, I think it would be OK to have a hidden flag to disable the counting so that impacted users could dig through and disable this if they want to make that trade off. Otherwise as long as we come up with a reasonable default implementation I'd be OK with this as-is. I do think BPF iterators would provide a better solution but I don't feel that this would be a requirement to make LBMap pressure metrics available (don't want to increase the scope of this PR too much). |
And we also have |
Yeah this would be great! We'd get map pressure metrics for all of the BPF maps this way and not just for a somewhat random subset that we do now. Wonder what'd be a good fallback when we don't have this supported. I guess we could still do the key iteration from user-space in those cases but just do it less often. |
Using the |
5b19bfc
to
401f384
Compare
I would like to suggest that we would go with this initially and then following up with the This way I can soon switch the default LB control-plane to the new one without missing the pressure metrics. |
401f384
to
84f82aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
envoy
✔️
84f82aa
to
d848f71
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor notes but otherwise LGTM.
d848f71
to
83b2b1c
Compare
83b2b1c
to
54755b8
Compare
/test |
401c695
to
b002b7e
Compare
/test |
cdc832e
to
30db196
Compare
/test |
71e6da5
to
58fbb97
Compare
58fbb97
to
f2b6065
Compare
/test |
f2b6065
to
056a91a
Compare
/test |
056a91a
to
01eba87
Compare
01eba87
to
9a951f4
Compare
/test |
1 similar comment
/test |
This allows registering pressure metrics without going through the registry global variable. Signed-off-by: Jussi Maki <jussi@isovalent.com>
This allows removing [metrics.NewBPFMapPressureGauge] and reduces the uses we have of the [metrics.registry] promise. This will also speed up tests as we won't have tests timing out trying to grab the registry via the promise on map pressure updates. Signed-off-by: Jussi Maki <jussi@isovalent.com>
BatchCount() uses BatchLookup to count up the number of elements in the BPF map. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Periodically (every 5 minutes) update the BPF map pressure metrics for all opened LB maps. This is on by default but can be disabled with '--lb-pressure-metrics-interval=0' (in helm can be set via 'extraConfig'). This is somewhat different to the "WithCache()" way of doing pressure metrics as we don't have an easy way of counting how many entries should be in the BPF map (doable, but then we'd need to match the logic in bpf_reconciler.go exactly). As a more foolproof way we instead get the number of keys in each BPF map and report the ratio of that to MaxEntries. This of course won't go beyond 1.0. Signed-off-by: Jussi Maki <jussi@isovalent.com>
9a951f4
to
1b75396
Compare
/test |
This adds the BPF pressure gauges for all the BPF maps used in pkg/loadbalance/maps. The pressure metric is updated every 5 minutes through
NextKeyBytes
count. This is meant to be a temporary measure until we have a BPF iterator way of grabbing BPF map counts (#37927 explored this, follow up later). Since we already switched to the new load-balancing control-plane I want to make sure we have this implemented for v1.18 even if in a sub-optimal way.The new load-balancing maps do not use
pkg/bpf
nor do they use theWithCache
mechanism (no point since the map operation retries are handled by the reconciler (pkg/loadbalancer/reconciler
). This means we can't use the(*bpf.Map).WithPressureMetric
and instead need to do it this way.The first two commits improve the BPFMapPressureGauge infra by removing the functions and moving this to be a method under
metrics.Registry
. This way we avoid the annoyingmetrics.registry
promise that a lot of tests hit a timeout with, but of course since lot of the maps are still globals I had to snake through the*metrics.Registry
to these. TheWithPressureGauge
will be a no-op with anil
registry so we can use that in tests or cilium-dbg.