-
Notifications
You must be signed in to change notification settings - Fork 10
keeper Prometheus metrics #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
054cd12
to
2e5f4f1
Compare
2e5f4f1
to
cc9a680
Compare
@@ -75,6 +76,20 @@ func AddCommonFlags(cmd *cobra.Command, cfg *CommonConfig) { | |||
} | |||
} | |||
|
|||
var ( | |||
clusterIdentifier = prometheus.NewGaugeVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the metric we (GoCardless) have been using to join various time series together. By having all components report this in a consistent fashion (see here for the stolon-pgbouncer definition) it becomes possible to join series from totally different process/infrastructures on the cluster_name and store_prefix labels.
@@ -1391,6 +1482,10 @@ func (p *PostgresKeeper) postgresKeeperSM(pctx context.Context) { | |||
targetRole := db.Spec.Role | |||
log.Debugw("target role", "targetRole", string(targetRole)) | |||
|
|||
// Set metrics to power alerts about mismatched roles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1770,6 +1877,7 @@ func (p *PostgresKeeper) generateHBA(cd *cluster.ClusterData, db *cluster.DB, on | |||
func sigHandler(sigs chan os.Signal, cancel context.CancelFunc) { | |||
s := <-sigs | |||
log.Debugw("got signal", "signal", s) | |||
shutdownSeconds.SetToCurrentTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful for detecting a pending shutdown.
@@ -42,6 +42,13 @@ const ( | |||
RoleStandby Role = "standby" | |||
) | |||
|
|||
// Roles enumerates all possible Role values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a collection of Prometheus metrics to the keeper. The metrics are aimed to expose errors in the keeper sync loop, providing enough visibility to detect when the sync is failing (and some insight into why).
cc9a680
to
b937766
Compare
[^1]: gocardless/stolon#1 This commit is associated with an open PR [^1] to stolon that adds Prometheus metrics to the keeper. The changes here include adding a keeper dashboard that can visualise the keeper statuses and a couple of essential alerts for keeper health. We update the playground environment so developers can explore these metrics and make use of the dashboard. This includes a new docker image, which has been pushed to Docker hub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇amazing work; let's get this it.
keeper Prometheus metrics
Add a collection of Prometheus metrics to the keeper. The metrics are
aimed to expose errors in the keeper sync loop, providing enough
visibility to detect when the sync is failing (and some insight into
why).
This commit can be tested in the stolon-pgbouncer setup gocardless/stolon-pgbouncer#29