Skip to content

Conversation

derailed
Copy link
Contributor

@derailed derailed commented Nov 17, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

The operator depends on various goroutines being scheduled one time to perform critical tasks. Significatn scheduling lags could indicate potential issues with the operator functions.

  • Added GO scheduler latency metrics tracking to the cilium operator.
Expose Cilium operator go runtime scheduler latency prometheus metric `go_sched_latencies_seconds`

Signed-off-by: Fernand Galiana fernand.galiana@gmail.com

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 17, 2023
@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from 98845d7 to 99b774a Compare November 17, 2023 16:10
@derailed
Copy link
Contributor Author

/test

@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from 99b774a to 83fd95f Compare November 27, 2023 18:25
@derailed
Copy link
Contributor Author

/test

@derailed derailed marked this pull request as ready for review November 27, 2023 22:40
@derailed derailed requested a review from a team as a code owner November 27, 2023 22:40
@derailed derailed requested a review from pippolo84 November 27, 2023 22:40
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! 💯

@pippolo84 pippolo84 added area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/operator Impacts the cilium-operator component release-note/misc This PR makes changes that have no direct user impact. labels Nov 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Nov 28, 2023
@pippolo84
Copy link
Member

Nit: checkpatch is complaining about a typo in the commit message: Warning: WARNING:TYPO_SPELLING: 'runtine' may be misspelled - perhaps 'runtime'?

@derailed derailed changed the title Add cilium operator go runtine sched latency metrics Add cilium operator go runtime sched latency metrics Nov 29, 2023
@derailed
Copy link
Contributor Author

derailed commented Dec 1, 2023

/test

@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch 2 times, most recently from 0c3eab3 to ad6017b Compare December 6, 2023 15:20
@derailed
Copy link
Contributor Author

derailed commented Dec 7, 2023

/test

@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from ad6017b to 36729e5 Compare December 7, 2023 19:20
@derailed
Copy link
Contributor Author

/test

@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from 36729e5 to c90943a Compare December 14, 2023 16:05
@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from c90943a to fa71a0f Compare January 4, 2024 16:29
@derailed
Copy link
Contributor Author

derailed commented Jan 9, 2024

/test

@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from fa71a0f to 346bca4 Compare January 10, 2024 18:40
@derailed
Copy link
Contributor Author

/test

@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from 346bca4 to 4d890c4 Compare January 11, 2024 13:04
@derailed
Copy link
Contributor Author

/test

@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from 4d890c4 to 743d118 Compare January 12, 2024 13:07
@derailed
Copy link
Contributor Author

/test

The operator depends on various goroutines being scheduled one time to perform
critical tasks. Significant scheduling lags could indicate potential issues with the
operator functions.

- Added GO scheduler latency metrics tracking to the cilium operator.

Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com>
@derailed derailed force-pushed the pr/derailed/op_sched_metrics branch from 743d118 to f006d91 Compare January 12, 2024 14:45
@derailed
Copy link
Contributor Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 12, 2024
@dylandreimerink dylandreimerink added this pull request to the merge queue Jan 15, 2024
Merged via the queue into cilium:main with commit e1032a0 Jan 15, 2024
mhofstetter added a commit to mhofstetter/cilium that referenced this pull request Nov 28, 2024
Currently, the Cilium Operator only uses the metrics registry
of the `controller-runtime` as base registry if Gateway API
functionality is enabled. This has historic reasons as Gateway
API functionality was the first module that used `controller-runtime`
for its controllers.

In the meantime, other modules (Ingress Controller, Secret Sync, ...)
are using the `controller-runtime` too - but the metrics registry of
the library is still only used if Gateway API is enabled. In these
cases, the metrics of the `controller-runtime` library
(https://book.kubebuilder.io/reference/metrics-reference) aren't exposed.

Therefore, this commit changes the behaviour, that the metrics registry
of the `controller-runtime` library is always used as the base registry.

Note: Unfortunately, the `controller-runtime` library configures the base
registry to have Go runtime metrics included by default
(https://github.com/cilium/cilium/blob/main/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/metrics/metrics.go#L91-L92).
Therefore, this Go metrics collector has to be removed from the metrics
registry before registering a new Go metrics collector with custom Go
runtime metrics enabled. Otherwise the registration fails because
some Go runtime metrics are registered twice.

Related PRs:
- cilium#28931 (extract `controller-runtime` integration into its own Hive Cell)
- cilium#29245 (export go runtime sched latency metrics)

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Nov 29, 2024
Currently, the Cilium Operator only uses the metrics registry
of the `controller-runtime` as base registry if Gateway API
functionality is enabled. This has historic reasons as Gateway
API functionality was the first module that used `controller-runtime`
for its controllers.

In the meantime, other modules (Ingress Controller, Secret Sync, ...)
are using the `controller-runtime` too - but the metrics registry of
the library is still only used if Gateway API is enabled. In these
cases, the metrics of the `controller-runtime` library
(https://book.kubebuilder.io/reference/metrics-reference) aren't exposed.

Therefore, this commit changes the behaviour, that the metrics registry
of the `controller-runtime` library is always used as the base registry.

Note: Unfortunately, the `controller-runtime` library configures the base
registry to have Go runtime metrics included by default
(https://github.com/cilium/cilium/blob/main/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/metrics/metrics.go#L91-L92).
Therefore, this Go metrics collector has to be removed from the metrics
registry before registering a new Go metrics collector with custom Go
runtime metrics enabled. Otherwise the registration fails because
some Go runtime metrics are registered twice.

Related PRs:
- #28931 (extract `controller-runtime` integration into its own Hive Cell)
- #29245 (export go runtime sched latency metrics)

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/operator Impacts the cilium-operator component ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants