-
Notifications
You must be signed in to change notification settings - Fork 446
sentinel: Add Prometheus metrics #656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwh Thanks for your PR! It LGTM, just a small nit in the comments.
cmd/sentinel/cmd/metrics.go
Outdated
@@ -0,0 +1,73 @@ | |||
// Copyright 2017 Sorint.lab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/2017/2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot - fixed!
@benwh Can you please squash in a single commit? |
These metrics provide the ability to build alerts that tell us whether the sentinels are operating as expected: - The last time that the sentinel successfully processed the clusterdata. - Whether the sentinel is a leader. - The number of times that the sentinel has been elected leader.
@sgotti I'd originally kept it separate as it was also changing a file outside of the scope of this PR. But happy to do so, squashed now. |
@benwh Oh I haven't noticed that you also changed the keeper metrics file. Anyway it's not a big styling issue. I'm going to merge it. Thanks again! |
Excellent, thanks very much for the speedy review! |
These metrics provide the ability to build alerts that tell us whether the sentinels are operating as expected:
This follows on from the keeper metrics in #639.
We've found these sentinel metrics to be extremely useful, upon discovering that our sentinels could occasionally all become stuck - and no decisions would be made across the cluster - when there are issues communicating with etcd (we're planning to upstream a patch for that issue in the near future!)