Skip to content

sentinel: Add Prometheus metrics #656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 3, 2019

Conversation

benwh
Copy link
Contributor

@benwh benwh commented May 31, 2019

These metrics provide the ability to build alerts that tell us whether the sentinels are operating as expected:

  • The last time that the sentinel successfully processed the clusterdata.
  • Whether the sentinel is a leader.
  • The number of times that the sentinel has been elected leader.

This follows on from the keeper metrics in #639.

We've found these sentinel metrics to be extremely useful, upon discovering that our sentinels could occasionally all become stuck - and no decisions would be made across the cluster - when there are issues communicating with etcd (we're planning to upstream a patch for that issue in the near future!)

@benwh
Copy link
Contributor Author

benwh commented May 31, 2019

This is what the metrics look like in action, with 3 sentinels running:

Screenshot 2019-05-31 17 10 56

Copy link
Member

@sgotti sgotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwh Thanks for your PR! It LGTM, just a small nit in the comments.

@@ -0,0 +1,73 @@
// Copyright 2017 Sorint.lab
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/2017/2019

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot - fixed!

@sgotti
Copy link
Member

sgotti commented Jun 3, 2019

@benwh Can you please squash in a single commit?

These metrics provide the ability to build alerts that tell us whether
the sentinels are operating as expected:
- The last time that the sentinel successfully processed the
  clusterdata.
- Whether the sentinel is a leader.
- The number of times that the sentinel has been elected leader.
@benwh benwh force-pushed the sentinel-metrics branch from ca66559 to eabdd33 Compare June 3, 2019 15:08
@benwh
Copy link
Contributor Author

benwh commented Jun 3, 2019

@sgotti I'd originally kept it separate as it was also changing a file outside of the scope of this PR. But happy to do so, squashed now.

@sgotti
Copy link
Member

sgotti commented Jun 3, 2019

@benwh Oh I haven't noticed that you also changed the keeper metrics file. Anyway it's not a big styling issue. I'm going to merge it. Thanks again!

@sgotti sgotti merged commit 259ef10 into sorintlab:master Jun 3, 2019
@benwh benwh deleted the sentinel-metrics branch June 3, 2019 16:08
@benwh
Copy link
Contributor Author

benwh commented Jun 3, 2019

Excellent, thanks very much for the speedy review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants