Skip to content

Conversation

rectified95
Copy link
Contributor

@rectified95 rectified95 commented Oct 2, 2024

Following the example of the Hubble Dynamic Exporter, this PR implements the CFP to allow Hubble metrics to be dynamically configured with a ConfigMap, rather than a static string at startup: https://github.com/cilium/design-cfps/blob/main/hubble/CFP-30788-managing-hubble-metrics-cardinality.md

This PR has been scoped to allow turning metric handlers on/off and applying filters - changing the label set of a registered metric is moved to a follow-up change pending more design, because of constraints in Prometheus.

Changes:

  • add dynamic flow processor: registers/unregisters metric handlers based on received config
  • add metric cfg watcher: parse yaml config into MetricConfig struct added in metrics: add structured format for Hubble metrics and options. #34849
  • add FlowFilters to metrics handlers so they're only selectively applied
  • update launchHubble to conditionally use static/dynamic metrics config
  • add new ConfigMap for metric setup
  • extend the Handler interface with Deinit and HandleConfigurationUpdate functions
  • remove FlowProcessor interface and move ProcessFlow to Handler: all metric handlers are FPs and the only FP that's not a handler is DropEventEmitter, which is registered separately; struct Handlers is thus no longer necessary
  • convert handler bulk creation functions to operate on a single handler and be iterated on

Testing:

  • yaml config read&validate: TestReadMetricConfigFromCM
  • metric config reload, check handler set for added/removed metrics: TestHandlersUpdatedInDfpOnConfigChange
  • pull metrics from Hubble server: TestHubbleServerWithDynamicMetrics
  • handler re-registration + gather metrics: TestMetricReRegisterAndCollect
  • manually tested updating CM: hit breakpoints in new code paths under debugger
  • manually collected Hubble metrics specified in CM

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 2, 2024
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Oct 2, 2024
@rectified95 rectified95 force-pushed the rectified/dynamic_metrics_pr branch 2 times, most recently from 9e79903 to 8073372 Compare October 11, 2024 21:06
@rectified95 rectified95 marked this pull request as ready for review October 11, 2024 21:10
@rectified95 rectified95 requested review from a team as code owners October 11, 2024 21:10
@rectified95
Copy link
Contributor Author

rectified95 commented Oct 11, 2024

Just noticing that 2 days ago a refactoring of the options flags was merged causing a conflictng with this PR - marking as ready for review while I fix that up since it shouldn't affect the core of the feature and I would like feedback on this change.

FYI @chancez

@chancez chancez self-requested a review October 11, 2024 21:33
Copy link
Contributor

@chancez chancez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be pretty careful about registration. This definitely will need some more robust tests that go through registering/deregistering/registering the same handler to verify this doesn't panic.

You will also want to test registering/registering/registering the same handler but with different configuration too, since we want to be able to change which labels it has, but I suspect there's going to be cases where that's not valid. For example it's my understanding that it's not valid to have a metric with 3 labels, then reconfigure it to only have 2 labels, at least with the default Prometheus registry.

Prometheus might be a bit more opinionated about this, and while it might not be valid with the default registry, we may still want to find a way to support it somehow, since it's currently possible to change labels by restarting with a different configuration. This could definitely complicate the implementation though, so I would write tests first, and see what happens, then we can figure out how to proceed. I think it's okay if the initial implementation has some limitations on reconfiguring labels, and we can come back with a custom registry implementation in a follow up PR if we need to.

We also will need more robust validation and we need to be careful about accidentally loading configurations that are invalid. I believe the correct behavior when a new config has any invalid constructs, is to not change the existing metrics configuration at all until the entire new configuration is correct.

@rectified95 rectified95 marked this pull request as draft October 14, 2024 21:58
@maintainer-s-little-helper
Copy link

Commit d3080de does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 22, 2024
@rectified95 rectified95 force-pushed the rectified/dynamic_metrics_pr branch from d3080de to f200967 Compare October 22, 2024 19:06
@maintainer-s-little-helper
Copy link

Commit d3080de does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@rectified95 rectified95 force-pushed the rectified/dynamic_metrics_pr branch from f200967 to d1b87a0 Compare October 22, 2024 19:15
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 22, 2024
@rectified95 rectified95 force-pushed the rectified/dynamic_metrics_pr branch from d1b87a0 to 1037713 Compare October 23, 2024 18:54
@rectified95 rectified95 reopened this Oct 23, 2024
@rectified95 rectified95 force-pushed the rectified/dynamic_metrics_pr branch from c344ad3 to 1efd209 Compare November 26, 2024 20:45
@rectified95
Copy link
Contributor Author

rectified95 commented Nov 26, 2024

@rolinh I've rebased on main to resolve a failing lint check coming from yesterday's common labels change: #35628
Force pushed three times:

Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rolinh
Copy link
Member

rolinh commented Nov 26, 2024

/test

1 similar comment
@rectified95
Copy link
Contributor Author

/test

Sets up new ConfigMap "cilium-dynamic-metrics-config" and new Helm options under ".Values.hubble.metrics.dynamic".

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Adds new Hubble flag "hubble-dynamic-metrics-config-path".

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Adds new logic in Hubble for dynamic metric registration.
Adds new methods to the Handler interface: HandleConfigurationUpdate, Deinit.

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Pass filters to Hubble metric handlers' Init method.

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Fork metric setup logic into static and dynamic paths during Hubble launch.
Split handler bulk creation methods into single iterators.
Move ProcessFlow method into the Handler interface.

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Stop Hubble dynamic metric config change watcher when context is closed.
Return early in Hubble start when both dynamic and static metrics specified.
Fix pointer ownership convention.

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
@rectified95 rectified95 force-pushed the rectified/dynamic_metrics_pr branch from 1efd209 to e4978b4 Compare November 27, 2024 21:50
@rectified95
Copy link
Contributor Author

/ci-clustermesh

@rectified95
Copy link
Contributor Author

/ci-e2e-upgrade

@aanm aanm enabled auto-merge November 28, 2024 12:58
@aanm aanm self-assigned this Nov 28, 2024
@aanm aanm added this pull request to the merge queue Nov 28, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 28, 2024
Merged via the queue into cilium:main with commit 8ff5614 Nov 28, 2024
64 checks passed
marseel added a commit to marseel/cilium that referenced this pull request Feb 28, 2025
Fixes: cilium#35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
marseel added a commit to marseel/cilium that referenced this pull request Mar 3, 2025
Fixes: cilium#35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 6, 2025
Fixes: #35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
mereta pushed a commit to mereta/cilium that referenced this pull request Mar 6, 2025
Fixes: cilium#35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
romanspb80 pushed a commit to romanspb80/cilium that referenced this pull request Mar 7, 2025
Fixes: cilium#35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
rastislavs pushed a commit that referenced this pull request Mar 10, 2025
[ upstream commit ccd9709 ]

Fixes: #35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
rastislavs pushed a commit that referenced this pull request Mar 11, 2025
[ upstream commit ccd9709 ]

Fixes: #35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 11, 2025
[ upstream commit ccd9709 ]

Fixes: #35185

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants