-
Notifications
You must be signed in to change notification settings - Fork 3.4k
ipsec: New Prometheus metrics for XFRM configs #28400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c99f925
to
0ef455b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchaigno Good work.
88a8404
to
245e573
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
d68d2a1
to
69902f1
Compare
809b3e6
to
d08f0ad
Compare
Makes sense. I've split the metrics for XFRM states and policies. |
Moving common IPsec helpers to their own package will allow us to call them from both the agent and the Cilium CLIs. Also move from checkmate to testing.T. Note the prototype of countUniqueIPsecKeys changed to now take the XFRM states in argument. That also allows us to make the corresponding test unprivileged. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Remove the unit tests for xfrm_collector. They are simply not worth it. All xfrm_collector does is collect statistics via helper functions and feed them into Prometheus functions. The helper functions are covered by unit tests. Prometheus functions are covered by unit tests. We're really just passing the information from one to the other. But to cover this trivial logic in unit tests, we end up copying lots of information across the code and tests. We also need to pass the helper functions as a function pointer to xfrm_collector. Every new metric is going to need a new function pointer. I believe this is a waste of engineering time and very unlikely to catch mistakes. It doesn't even test the helper functions since we mock them out. So it's really unclear what we're trying to cover here. Removing it makes it easier to add new metrics in followup commits. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This new metric matches the existing information given by cilium encrypt status, showing the number of IPsec keys currently in use on the node. The number of keys should only change during key rotation and it should go back to 1 after a configurable delay. This metric can help alert if something went wrong during a key rotation. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Provide two new helper functions to count the number of XFRM policies and states on the node, broken down by directions. XFRM policies can have directions in, out, and fwd, whereas XFRM states can only be in or out. For XFRM states, the direction is not a built-in parameter, but something Cilium encodes in the packet mark. These helper functions will be used in the subsequent commit to expose new metrics. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This commit adds new Prometheus metrics for the number of XFRM states and policies for each direction. These can be used to alert on XFRM leaks. We had one such leak in the past for XFRM out policies which led to a performance degradation when the node churn was high. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
d08f0ad
to
b7cf567
Compare
/test |
Add new Prometheus metrics to report the number of keys, XFRM IN/OUT states, and XFRM IN/OUT/FWD policies. See commits for details.