-
Notifications
You must be signed in to change notification settings - Fork 486
Implement a prometheus output for test-metrics scraping #607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a prometheus output for test-metrics scraping #607
Conversation
@aelsabbahy this is ready for review, I think. If it's a welcome PR, then the following people might be interested too based on the previous tickets I've seen. As with #585, I'll let you decide whether/when to do that.
|
@petemounce I think its important for visualization over time that the operator can discover what goss check failed and when. The format in PR #175, looks like it would provide the operator with all the information they need. |
@Smithx10 I agree that would be useful, but I don't think it would outweigh the storage costs and operational risk it would incur on the prometheus server.
The pertinent point from the 2nd link:
Remember also that prometheus doesn't currently have downsampling/compaction (prometheus-junkyard/tsdb#56 and prometheus-junkyard/tsdb#313) - so if a single value is exported that is never seen again, the space on disk for that time series is only reclaimed after the global retention period elapses. Adding the My working assumption has been the standard "monitoring metrics allow alerts on symptoms, which trigger investigation via metrics dashboards and reading centralised logs (which contain more context)". Can you suggest any way(s) to compromise? |
resource/validate.go
Outdated
return humanOutcomes | ||
} | ||
|
||
func Resources() []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hack and I'm about to open a different PR that will let this be replaced by the same pattern that the outputs
package uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That other PR is #611.
@petemounce How about making it configurable ? By default the endpoint only returns metrics with the |
@petemounce The goss is a test tool, I hope to output success or failure results for each test case... |
@timeu
? Problem is, at present, there's no way I can see to easily
at the existing command-line option (a string slice) without quite a larger-scope change. How would the command-line interface need to change to support this?
That said - would either of you (or someone else that might be reading) be up for making a pull-request into my branch that adds the ability to customise the set of labels that would be applied to the metrics being output? Or, do that as a follow-up PR assuming this could be treated as an MVP? This PR meets, in my opinion (and hope :) ), the criteria for being merged in that it adds well-tested non-breaking-change functionality and moves towards what we all want, which is to monitor the outcomes and durations of our machine-state assertions so that we can
(Those are my aims with this PR, at any rate) |
01c2eb2
to
80cace5
Compare
@harre-orz, @Smithx10 & @timeu - friendly ping for feedback? I would say that if this merges as-is (it's quite a large PR after all), we can iterate something that satisfies us all? What do you think? |
@petemounce : Sorry for the lack of feedback. I think that sounds like a plan. |
80cace5
to
b0e5004
Compare
@harre-orz @Smithx10 - do you have any feedback? |
@petemounce Sorry late a feedback. |
@harre-orz no worries; thanks for getting back to me. :) |
b0e5004
to
f642ce6
Compare
1623875
to
fec281b
Compare
@Smithx10 wondering if you have any feedback on this? |
Note - I just added overall outcome counting, and labels to the overall outcome/duration metrics. |
04a59fc
to
8da4147
Compare
Note to watchers - with the change inside #611, I think that would facilitate a follow-on that makes the prometheus output deliver resource-ids if someone wants that to be possible. I'd prefer that to be non-default for the reasoning above. That would look roughly like:
|
8da4147
to
ded6154
Compare
@aelsabbahy I think this is ready to land now, assuming it passes re-review and CI. If it fails CI I'll fix that, obviously, but later today. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
dissapointed |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
dissapointed more |
@aelsabbahy I think, it would be great to have prometheus output. If I understand correctly, this working PR was stopped by your comment #607 (comment) about the I don't think I can do the proposed refactoring, but I think, I could just rip put the metrics and make this PR smaller. Would this help to have it merged? |
The intent of this PR is to add metrics support. I haven't understood what you propose to remove from this to enable its merge? |
First, thanks for your work @petemounce ! |
It's been so long.. so memory is a bit fuzzy on this. What's the current status of the PR and what remains? @petemounce Last two comments from you seems to indicate there was some remaining work outstanding? |
Both endpoints are served via the same implementation as far as I recall. @aelsabbahy - basically, the remaining work is what you described within #607 (comment) |
If this is good to go, feel free to rebase and I can merge it. |
I don't think it is, because of the still-outstanding work to make metrics a concept introduced at the level you directed in #607 (comment) |
Hello watchers, there's been a new and simpler attempt at this over in #771, but I do not know if it meets the needs of those asking for this feature. I'm willing to merge whatever solution the community agrees on. @petemounce your feedback would be highly appreciated on that PR since you've spent quite a bit of time analyzing this 😄 |
🥳 |
Thanks for merging! |
@petemounce
|
Good spot, yes. Would you mind opening a new issue? For reference, I think https://github.com/aelsabbahy/goss/blob/master/outputs/prometheus.go#L85 is the culprit - I've neglected to account for the nested loop. |
@petemounce: Created an issue #789 Also one more question: |
I think so; the output will be scoped to the content of the run. |
@petemounce as part of #789 can you add some documents in the manual or do you prefer a separate ticket? |
I'll tackle that there; thanks for flagging. |
Checklist
make test-all
(UNIX) passes. CI will also test thisDescription of change
Fixes #175, #362, #363.
Adds a
prometheus
output format. I want to be able to scrape mygoss serve
endpoints so that I canThis records
Sample (from unit-test log):
Implementation details and notes
I expect to be using this most in the context of
goss serve --format prometheus
, so that I can configure my prometheus servers to scrape servers that I'm running goss endpoints on. Therefore https://prometheus.io/docs/practices/instrumentation/#online-serving-systems was the most appropriate model to choose.I followed the advice on metric and label naming (https://prometheus.io/docs/practices/naming/) (
er, actually I didn't initially and will push a fix for thatfixed in1f2cd75).
I followed the advice on Labels (https://prometheus.io/docs/practices/instrumentation/#use-labels) and not to overuse them (https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels) - I could have added a label for each field of a resource.TestResult, but I didn't think I'd need that ever, and I definitely didn't think it would be worth the cardinality explosion that would have been at risk if I had.
I chose to not use the prometheus Default Registry, because this also adds in process and golang metrics that I don't think I need. This is theIt's easy to swap back to the defaults again, and the change in doing so would be additive, not breaking, to the output.registry = prometheus.NewRegistry()
and.With(registry)
bits inside thevar
block./metrics
endpoint. That's what that ecosystem expects and uses by default.