-
Notifications
You must be signed in to change notification settings - Fork 3.4k
pkg/metrics/bpf: new bpf_maps & bpf_progs metrics #29984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Commits 72016fa, f5f1ab4 do not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
f5f1ab4
to
eb3acee
Compare
Commit 72016fa does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
eb3acee
to
194650c
Compare
194650c
to
ad8ba31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvisonneau LGTM Thank you for this update!
IIUC, it is indeed invoked on each /metrics query
…On Wed, 7 Feb 2024 at 15:06, Timo Beckers ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/metrics/bpf.go
<#29984 (comment)>:
> }
func (s *bpfCollector) Collect(ch chan<- prometheus.Metric) {
- mapMem, err := getMemoryUsage("map")
+ // Avoid querying BPF multiple times concurrently
+ s.mutex.Lock()
Is Collect() invoked on each hit to /metrics? Or does the agent cache
these metrics somewhere and periodically refresh them?
—
Reply to this email directly, view it on GitHub
<#29984 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANOCLZII6O6S6VV7NWHPWDYSOC5NAVCNFSM6AAAAABA3BQWEOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNRXHA4TQNJWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@mvisonneau Some tests and linters are failing, please take a look. |
ad8ba31
to
f634623
Compare
👋 hey @ti-mo, indeed my bad 🙈 , I've rebased and hopefully also have sorted all failing tests! |
/test |
There seems to be two consistently failing tests here: If you feel your PR is not the cause of these please attempt a rebase and we will run tests again. |
The smoke test failures report an issue with the new metrics in the PR:
(Maybe |
I believe to have sorted most suggestions/issues. Let me know if there is anything else I can do! 👍 😄 |
@mvisonneau it seems that the CI is failing:
|
argh 🙈 I ran |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever, just keep in mind this will miss programs that don't refer to maps, if we ever add those. For now, this will do. 👍
Head branch was pushed to by a user without write access
thanks, I've just rebased, hope that all of tests will go through this time! 🤞 |
/test |
A variety of tests have failed, I suggest triaging them to identify any potential links to the PR. You are of course welcome to rebase and retrigger CI again, but if there is an underlying issue in the PR itself then that course of action will have no impact. Until the PR is passing CI, I'll mark this as draft. Feel free to reach out on Slack if you would like to discuss #testing issues or you need someone to retrigger the full CI. When CI is passing you can mark it "ready for review". |
This pull request has been automatically marked as stale because it |
This pull request has not seen any activity since it was marked stale. |
This change introduces two new metrics to monitor the amount of bpf maps and programs currently defined on the system. We recently discovered that abnormally large amounts of maps could be found on some of our nodes. Additionally, this change introduces request concurrency limit as well as dynamic definition of the metrics list throughout the Describe() function. Signed-off-by: Maxime Visonneau <maxime.visonneau@gmail.com>
48b0a7e
to
f204abc
Compare
Apologies for not having thoroughly followed-up on this! Seems like CI tests are passing now although we likely need a |
/test |
This change introduces two new metrics to monitor the amount of bpf maps and programs currently defined on the system.
We recently discovered that abnormally large amounts of maps could be found on some of our nodes.
Additionally, this change introduces request concurrency limit as well as dynamic definition of the metrics list throughout the Describe() function.