Skip to content

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented May 2, 2024

This PR extends the PromQL test scripting language to support asserting that a query fails with a particular error message or an error message matching a regexp (rather than just asserting that the query fails for any reason).

@charleskorn charleskorn requested a review from roidelapluie as a code owner May 2, 2024 02:48
promql/test.go Outdated
@@ -292,6 +292,12 @@ func (t *test) parseEval(lines []string, i int) (int, *evalCmd, error) {
i--
break
}

if cmd.fail && strings.HasPrefix(defLine, "expected_message") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about fail_message or fail_msg to be more explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to expected_fail_message, how's that?

@charleskorn charleskorn requested a review from machine424 May 8, 2024 06:43
@beorn7
Copy link
Member

beorn7 commented May 8, 2024

Note for a future PR: It would be good to also be able to assert annotations.

@zenador
Copy link
Contributor

zenador commented May 8, 2024

The nhcb branch lets us assert warning annotations, but not yet specific ones. After this is merged, when I fix the conflicts I can look into doing something similar to assert specific annotations. Might be slightly more complicated though because you can only have maximum 1 error but you may have multiple warning annotations.

@bboreham
Copy link
Member

bboreham commented May 8, 2024

Why do we need both an explicit string and a regex, given the former is a subset of the latter?

Incidentally "pattern" did not immediately suggest "regex" to me, maybe because I spent too much time using LogQL.

@charleskorn
Copy link
Contributor Author

charleskorn commented May 9, 2024

Why do we need both an explicit string and a regex, given the former is a subset of the latter?

Many error messages contain characters that are special in regexes (eg. { or [), so it's often clearer to use a simple string match than try to escape all the special characters in a regex.

However, an exact match isn't great for every case - sometimes a regex is more convenient or clearer (eg. if only parts of the error message are important).

Incidentally "pattern" did not immediately suggest "regex" to me, maybe because I spent too much time using LogQL.

Fair point, I'll change this.

Signed-off-by: Charles Korn <charles.korn@grafana.com>
…age`

Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
@charleskorn charleskorn force-pushed the assert-on-query-error branch from 199ff32 to 78861fe Compare May 10, 2024 00:19
@charleskorn
Copy link
Contributor Author

I've rebased this to resolve the conflicts with #13999.

@charleskorn
Copy link
Contributor Author

charleskorn commented May 17, 2024

Any chance I could get a review on this? Would be great to get this merged.

@machine424
Copy link
Member

The PR has diverged, with the introduction of regexes, since my initial review.

However, an exact match isn't ideal for every case - sometimes a regex is more convenient or clearer (for instance, if only parts of the error message are significant).

Could you provide examples?
When adding a new test, given that promtool will return the precise failure message, users can simply copy and paste it (if they agree with it).

Moreover, we should clarify that we do not guarantee the stability of these messages, which might make these checks challenging to maintain.
Also, users might not need to perform such checks and can trust promql to emit a good failure message for the failing step, it's up to promql's tests to ensure this.
Could tell us more about the use cases of these fail message checks?

@charleskorn
Copy link
Contributor Author

However, an exact match isn't ideal for every case - sometimes a regex is more convenient or clearer (for instance, if only parts of the error message are significant).

Could you provide examples? When adding a new test, given that promtool will return the precise failure message, users can simply copy and paste it (if they agree with it).

Moreover, we should clarify that we do not guarantee the stability of these messages, which might make these checks challenging to maintain. Also, users might not need to perform such checks and can trust promql to emit a good failure message for the failing step, it's up to promql's tests to ensure this. Could tell us more about the use cases of these fail message checks?

The use case for this is not for promtool test, but for testing the PromQL engine itself and verifying that the error messages returned in failure cases are as expected.

@machine424
Copy link
Member

The use case for this is not for promtool test, but for testing the PromQL engine itself and verifying that the error messages returned in failure cases are as expected.

In that case, I think we can go with exact matching if we want to have such safeguards in our tests and let's see if they're easy to maintain.

@charleskorn
Copy link
Contributor Author

The use case for this is not for promtool test, but for testing the PromQL engine itself and verifying that the error messages returned in failure cases are as expected.

In that case, I think we can go with exact matching if we want to have such safeguards in our tests and let's see if they're easy to maintain.

Unfortunately exact matching isn't always enough for my use case: I am testing Mimir's replacement PromQL engine, and it doesn't guarantee that error messages are identical to Prometheus' engine.

For example, in some cases, Mimir's engine includes additional information in error messages, and in other cases, implementation differences mean we iterate through series in a different order and so different series are used for the error message.

@beorn7
Copy link
Member

beorn7 commented May 30, 2024

I'm currently reviewing #13662 (which @zenador mentioned above as another PR that needs assertions on returned errors (in this case annotations)). Since we use this for both the internal PromQL testing framework and the user-facing rule unit tests, I would tend towards making it flexible, even if the only use case right now is in Mimir. Every new feature in PromQL will ideally be testable in the framework and needs to extend it (as seen with native histograms), so erring on the side of making it more flexible is probably less hassle in the long run.

@beorn7
Copy link
Member

beorn7 commented May 30, 2024

Note that https://github.com/prometheus/prometheus/blob/main/docs/configuration/unit_testing_rules.md needs to be updated with the changes here.

Copy link
Member

@machine424 machine424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks to you two for the details.
Let's give this a try ;)

@beorn7
Copy link
Member

beorn7 commented May 30, 2024

@charleskorn could you update https://github.com/prometheus/prometheus/blob/main/docs/configuration/unit_testing_rules.md accordingly? Then we can merge this, and then #13662 can be adjusted on top of this.

@charleskorn
Copy link
Contributor Author

@charleskorn could you update https://github.com/prometheus/prometheus/blob/main/docs/configuration/unit_testing_rules.md accordingly? Then we can merge this, and then #13662 can be adjusted on top of this.

Sorry, I think I'm missing something - those docs don't talk about asserting that a query fails, so I'm not sure where I'd add documentation for asserting on the failure message.

@beorn7
Copy link
Member

beorn7 commented Jun 3, 2024

Maybe that was an omission even before. Or maybe you cannot use the query fail assertions with the promtool rules test?
But I hope you can. Essentially, the unit test documentation is also used if you write tests in the testing framework.

Sorry for not being able to be more specific, I'm not a domain expert here. I would propose the following steps:

  1. Find out if the rule testing via promtool is capable of utilizing the query failure assertion.
  2. If so, add documentation to the promtool rule test documentation, so everyone can use it.
  3. If not, we have to find another way of documenting how to assert query failures.

@charleskorn
Copy link
Contributor Author

Maybe that was an omission even before. Or maybe you cannot use the query fail assertions with the promtool rules test?

As far as I can see, promtool unittest does not support asserting that queries fail.

I'll add a separate doc explaining the test scripting language and its features.

Signed-off-by: Charles Korn <charles.korn@grafana.com>
@charleskorn
Copy link
Contributor Author

I'll add a separate doc explaining the test scripting language and its features.

Added: https://github.com/charleskorn/prometheus/blob/assert-on-query-error/promql/promqltest/README.md

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants