-
Notifications
You must be signed in to change notification settings - Fork 486
Description
Describe the feature:
The prometheus output format should include references to test cases in order to allow more fine grained alerting. The current implementation only contains a summary of test cases for each resource type and a summary across all tests. Since test cases can have different criticality levels, neither of those summaries are sufficient to decide whether a person should be called right now (potentially in the middle of the night) or not.
This has been previously discussed in #607 and my understanding is that it was postponed, but not rejected. There is a valid concern for storage cost, so I don't think this should/must be enabled by default, but rather should be an format option for the prometheus output.
Describe the solution you'd like
The metric goss_tests_outcomes_total
should contain additional labels to uniquely identify a single test case or a different metric should be introduced that does exactly that.
Describe alternatives you've considered
My current workaround is an annotation I placed on the prometheus alert for goss tests that links to a runbook which tells the poor soul who has to deal with this to do:
$ ssh ...
$ goss --gossfile ... validate
This returns the failing tests and they can decide whether to fix it right now or go back to bed.