Skip to content

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Oct 4, 2023

Add new Prometheus metrics to report the number of keys, XFRM IN/OUT states, and XFRM IN/OUT/FWD policies. See commits for details.

@pchaigno pchaigno added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. needs-backport/1.12 labels Oct 4, 2023
@pchaigno pchaigno force-pushed the new-ipsec-metrics branch 2 times, most recently from c99f925 to 0ef455b Compare October 4, 2023 11:20
@pchaigno pchaigno marked this pull request as ready for review October 4, 2023 11:22
@pchaigno pchaigno requested review from a team as code owners October 4, 2023 11:22
Copy link
Contributor

@zacharysarah zacharysarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchaigno Good work.

@pchaigno pchaigno force-pushed the new-ipsec-metrics branch 2 times, most recently from 88a8404 to 245e573 Compare October 6, 2023 11:15
Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@pchaigno pchaigno force-pushed the new-ipsec-metrics branch 3 times, most recently from d68d2a1 to 69902f1 Compare October 9, 2023 08:54
@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 9, 2023
@squeed squeed added dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Oct 16, 2023
@pchaigno
Copy link
Member Author

Reasoning: usually you use labels for different dimensions of a given metric, and you can aggregate over the dimensions in a meaningful way.

Makes sense. I've split the metrics for XFRM states and policies.

Moving common IPsec helpers to their own package will allow us to call
them from both the agent and the Cilium CLIs.

Also move from checkmate to testing.T.

Note the prototype of countUniqueIPsecKeys changed to now take the XFRM
states in argument. That also allows us to make the corresponding test
unprivileged.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Remove the unit tests for xfrm_collector. They are simply not worth it.
All xfrm_collector does is collect statistics via helper functions and
feed them into Prometheus functions. The helper functions are covered by
unit tests. Prometheus functions are covered by unit tests. We're really
just passing the information from one to the other.

But to cover this trivial logic in unit tests, we end up copying lots of
information across the code and tests. We also need to pass the helper
functions as a function pointer to xfrm_collector. Every new metric is
going to need a new function pointer.

I believe this is a waste of engineering time and very unlikely to catch
mistakes. It doesn't even test the helper functions since we mock them
out. So it's really unclear what we're trying to cover here. Removing it
makes it easier to add new metrics in followup commits.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This new metric matches the existing information given by cilium encrypt
status, showing the number of IPsec keys currently in use on the node.
The number of keys should only change during key rotation and it should
go back to 1 after a configurable delay. This metric can help alert if
something went wrong during a key rotation.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Provide two new helper functions to count the number of XFRM policies
and states on the node, broken down by directions. XFRM policies can
have directions in, out, and fwd, whereas XFRM states can only be in or
out. For XFRM states, the direction is not a built-in parameter, but
something Cilium encodes in the packet mark.

These helper functions will be used in the subsequent commit to expose
new metrics.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This commit adds new Prometheus metrics for the number of XFRM states
and policies for each direction. These can be used to alert on XFRM
leaks. We had one such leak in the past for XFRM out policies which led
to a performance degradation when the node churn was high.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 19, 2023
@joestringer joestringer merged commit 6db3b8f into cilium:main Oct 19, 2023
@pchaigno pchaigno deleted the new-ipsec-metrics branch October 19, 2023 21:29
@tklauser tklauser added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 labels Oct 24, 2023
@tklauser tklauser mentioned this pull request Oct 24, 2023
6 tasks
@tklauser tklauser added backport-pending/1.12 backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed needs-backport/1.12 backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Oct 24, 2023
@julianwiedmann julianwiedmann added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
No open projects
Status: Released
Status: Released
Development

Successfully merging this pull request may close these issues.