Skip to content

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented Nov 19, 2024

Add metrics lock and unlock metrics in order to make it easier to understand how much time they take, and if there potentially is a lot of contention around these locks. Its also interesting as these result in potentially multiple etcd requests in case there is contention, and waits until it successfully has grabbed the lock.

This metric does not include the time it takes to potentially acquire a new session, or the time taken to grab the local kvstore lock for the given key.


Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #issue-number

Added Lock and Unlock metric for kvstore locks

# operator: With this PR and https://github.com/cilium/cilium/pull/36014#pullrequestreview-2445051298
$ kubectl get --raw /api/v1/namespaces/kube-system/pods/http:$(kubectl get pods -n kube-system -l io.cilium/app=operator -ojsonpath='{.items[0].metadata.name}'):9963/proxy/metrics |egrep -a "(kvstore|etcd|version)" |grep "cilium_operator_kvstore_operations_duration_seconds_count"
cilium_operator_kvstore_operations_duration_seconds_count{action="AcquireLease",kind="set",outcome="success",scope="lease"} 2
cilium_operator_kvstore_operations_duration_seconds_count{action="ListPrefix",kind="read",outcome="success",scope="identities/v1"} 1649
cilium_operator_kvstore_operations_duration_seconds_count{action="ListPrefixLocked",kind="read",outcome="success",scope="identities/v1"} 2356
cilium_operator_kvstore_operations_duration_seconds_count{action="Lock",kind="set",outcome="success",scope="cilium/.init"} 395
cilium_operator_kvstore_operations_duration_seconds_count{action="Lock",kind="set",outcome="success",scope="identities/v1"} 2356
cilium_operator_kvstore_operations_duration_seconds_count{action="Unlock",kind="delete",outcome="success",scope="cilium/.init"} 395
cilium_operator_kvstore_operations_duration_seconds_count{action="Unlock",kind="delete",outcome="success",scope="identities/v1"} 2356
cilium_operator_kvstore_operations_duration_seconds_count{action="Update",kind="set",outcome="success",scope="cilium/.hear"} 197
cilium_operator_kvstore_operations_duration_seconds_count{action="Update",kind="set",outcome="success",scope="default/cilium"} 2

# agent: With this PR
$ k get --raw /api/v1/namespaces/kube-system/pods/http:$(kubectl get pods -n kube-system -l k8s-app=cilium -ojsonpath='{.items[0].metadata.name}'):9090/proxy/metrics |grep "cilium_kvstore_operations_duration_seconds_count"
cilium_kvstore_operations_duration_seconds_count{action="AcquireLease",kind="set",outcome="success",scope="lease"} 2
cilium_kvstore_operations_duration_seconds_count{action="CreateOnly",kind="set",outcome="success",scope="identities/v1"} 21
cilium_kvstore_operations_duration_seconds_count{action="CreateOnlyLocked",kind="set",outcome="success",scope="identities/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Delete",kind="delete",outcome="success",scope="identities/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Delete",kind="delete",outcome="success",scope="ip/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Get",kind="read",outcome="success",scope="identities/v1"} 22
cilium_kvstore_operations_duration_seconds_count{action="Get",kind="read",outcome="success",scope="ip/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Get",kind="read",outcome="success",scope="nodes/v1"} 38555
cilium_kvstore_operations_duration_seconds_count{action="ListPrefix",kind="read",outcome="success",scope="identities/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="ListPrefixLocked",kind="read",outcome="success",scope="identities/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Lock",kind="set",outcome="success",scope="cilium/.init"} 63
cilium_kvstore_operations_duration_seconds_count{action="Lock",kind="set",outcome="success",scope="identities/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Unlock",kind="delete",outcome="success",scope="cilium/.init"} 63
cilium_kvstore_operations_duration_seconds_count{action="Unlock",kind="delete",outcome="success",scope="identities/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Update",kind="set",outcome="success",scope="identities/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Update",kind="set",outcome="success",scope="ip/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="Update",kind="set",outcome="success",scope="nodes/v1"} 1
cilium_kvstore_operations_duration_seconds_count{action="UpdateIfLocked",kind="set",outcome="success",scope="identities/v1"} 1

Add metrics lock and unlock metrics in order to make it easier to
understand how much time they take, and if there potentially is a lot of
contention around these locks.

This metric does not include the time it takes to potentially acquire a
new session, or the time taken to grab the local kvstore lock for the
given key.

Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 19, 2024
@odinuge odinuge marked this pull request as ready for review November 19, 2024 11:16
@odinuge odinuge requested a review from a team as a code owner November 19, 2024 11:16
@odinuge odinuge requested a review from giorio94 November 19, 2024 11:16
@giorio94 giorio94 added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/kvstore Impacts the KVStore package interactions. labels Nov 19, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Nov 19, 2024
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@giorio94
Copy link
Member

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 20, 2024
@tklauser tklauser enabled auto-merge November 20, 2024 13:28
@tklauser tklauser added this pull request to the merge queue Nov 20, 2024
Merged via the queue into cilium:main with commit db6aae7 Nov 20, 2024
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kvstore Impacts the KVStore package interactions. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants