-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Port bpftool
probing and usage metrics collection to Go
#39557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/test |
marseel
approved these changes
May 15, 2025
ti-mo
added a commit
to ti-mo/cilium
that referenced
this pull request
May 15, 2025
Since 7b92700 and e9438c2, egress policy programs are inserted unconditionally, even when the L7 proxy is disabled. Unfortunately, the teardown logic only runs when the proxy is enabled, causing all handle_policy_egress programs to leak. Double unfortunately, since this program doesn't reference any Cilium maps in its attached-but-disabled state (it's an empty program), it doesn't show up in Cilium's maps metrics, making the issue hard to spot. The latter is being addressed in cilium#39557. This commit makes the removal unconditional, since the attachment is also unconditional. I've decided against making both attachment and detachment conditional, since that would still cause orphaned programs if the l7 proxy was disabled on a running cluster. Endpoint teardown should always attempt to remove all known policy programs for robustness. Signed-off-by: Timo Beckers <timo@isovalent.com>
9965054
to
bb2b1a2
Compare
/test |
rgo3
approved these changes
May 19, 2025
A follow-up commit needs a new API for accessing program memlock info. Signed-off-by: Timo Beckers <timo@isovalent.com>
This commit converts the last bpftool shellout of the Cilium agent to a Go-based equivalent. See doc comments for more details. The new implementation makes collecting bpf metrics an order of magnitude faster, especially on smaller deployments where the startup cost of two bpftool invocations doesn't amortize well. `bpftool {map,prog} show` does a lot of work behind the scenes: it iterates all objects, reads and parses associated btf, gets object info, fetches some associated data (pid comm) and then pushes everything to stdout. After all this, the agent needs to read and decode the output. The new implementation takes a different approach: it iterates all programs and considers only the ones with a name starting with cil_ or tail_ (all prog names will be made consistent in a follow-up commit) and recursively opens associated maps, taking care not to visit the same object more than once during a single collection round. An initial attempt at discovering tail calls by iterating associated prog arrays proved much too slow and unscalable, so this idea was shelved. Signed-off-by: Timo Beckers <timo@isovalent.com>
Due to a prior commit, all Cilium programs now need to either have the cil_ or tail_ prefixes for entry points and tail calls respectively. This enables straightforward metrics collection. Signed-off-by: Timo Beckers <timo@isovalent.com>
bb2b1a2
to
0482b98
Compare
/test |
derailed
approved these changes
May 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ti-mo Nice work!
gentoo-root
approved these changes
May 20, 2025
github-merge-queue bot
pushed a commit
that referenced
this pull request
May 20, 2025
Since 7b92700 and e9438c2, egress policy programs are inserted unconditionally, even when the L7 proxy is disabled. Unfortunately, the teardown logic only runs when the proxy is enabled, causing all handle_policy_egress programs to leak. Double unfortunately, since this program doesn't reference any Cilium maps in its attached-but-disabled state (it's an empty program), it doesn't show up in Cilium's maps metrics, making the issue hard to spot. The latter is being addressed in #39557. This commit makes the removal unconditional, since the attachment is also unconditional. I've decided against making both attachment and detachment conditional, since that would still cause orphaned programs if the l7 proxy was disabled on a running cluster. Endpoint teardown should always attempt to remove all known policy programs for robustness. Signed-off-by: Timo Beckers <timo@isovalent.com>
joamaki
pushed a commit
that referenced
this pull request
May 23, 2025
[ upstream commit 2d8769d ] [ backporter's note: fixed minor conflict due to missing slog changes ] Since 7b92700 and e9438c2, egress policy programs are inserted unconditionally, even when the L7 proxy is disabled. Unfortunately, the teardown logic only runs when the proxy is enabled, causing all handle_policy_egress programs to leak. Double unfortunately, since this program doesn't reference any Cilium maps in its attached-but-disabled state (it's an empty program), it doesn't show up in Cilium's maps metrics, making the issue hard to spot. The latter is being addressed in #39557. This commit makes the removal unconditional, since the attachment is also unconditional. I've decided against making both attachment and detachment conditional, since that would still cause orphaned programs if the l7 proxy was disabled on a running cluster. Endpoint teardown should always attempt to remove all known policy programs for robustness. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
May 27, 2025
[ upstream commit 2d8769d ] [ backporter's note: fixed minor conflict due to missing slog changes ] Since 7b92700 and e9438c2, egress policy programs are inserted unconditionally, even when the L7 proxy is disabled. Unfortunately, the teardown logic only runs when the proxy is enabled, causing all handle_policy_egress programs to leak. Double unfortunately, since this program doesn't reference any Cilium maps in its attached-but-disabled state (it's an empty program), it doesn't show up in Cilium's maps metrics, making the issue hard to spot. The latter is being addressed in #39557. This commit makes the removal unconditional, since the attachment is also unconditional. I've decided against making both attachment and detachment conditional, since that would still cause orphaned programs if the l7 proxy was disabled on a running cluster. Endpoint teardown should always attempt to remove all known policy programs for robustness. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/minor
This PR changes functionality that users may find relevant to operating Cilium.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #38979