promql: extend test scripting language to support asserting on expected error message #14038

charleskorn · 2024-05-02T02:48:52Z

This PR extends the PromQL test scripting language to support asserting that a query fails with a particular error message or an error message matching a regexp (rather than just asserting that the query fails for any reason).

machine424 · 2024-05-07T13:43:40Z

promql/test.go

@@ -292,6 +292,12 @@ func (t *test) parseEval(lines []string, i int) (int, *evalCmd, error) {
 			i--
 			break
 		}
+
+		if cmd.fail && strings.HasPrefix(defLine, "expected_message") {


how about fail_message or fail_msg to be more explicit?

I've changed it to expected_fail_message, how's that?

promql/test.go

beorn7 · 2024-05-08T13:05:53Z

Note for a future PR: It would be good to also be able to assert annotations.

zenador · 2024-05-08T13:51:39Z

The nhcb branch lets us assert warning annotations, but not yet specific ones. After this is merged, when I fix the conflicts I can look into doing something similar to assert specific annotations. Might be slightly more complicated though because you can only have maximum 1 error but you may have multiple warning annotations.

bboreham · 2024-05-08T19:05:10Z

Why do we need both an explicit string and a regex, given the former is a subset of the latter?

Incidentally "pattern" did not immediately suggest "regex" to me, maybe because I spent too much time using LogQL.

charleskorn · 2024-05-09T02:28:04Z

Why do we need both an explicit string and a regex, given the former is a subset of the latter?

Many error messages contain characters that are special in regexes (eg. { or [), so it's often clearer to use a simple string match than try to escape all the special characters in a regex.

However, an exact match isn't great for every case - sometimes a regex is more convenient or clearer (eg. if only parts of the error message are important).

Incidentally "pattern" did not immediately suggest "regex" to me, maybe because I spent too much time using LogQL.

Fair point, I'll change this.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

…age` Signed-off-by: Charles Korn <charles.korn@grafana.com>

Signed-off-by: Charles Korn <charles.korn@grafana.com>

charleskorn · 2024-05-10T00:19:42Z

I've rebased this to resolve the conflicts with #13999.

charleskorn · 2024-05-17T03:49:04Z

Any chance I could get a review on this? Would be great to get this merged.

machine424 · 2024-05-17T10:12:34Z

The PR has diverged, with the introduction of regexes, since my initial review.

However, an exact match isn't ideal for every case - sometimes a regex is more convenient or clearer (for instance, if only parts of the error message are significant).

Could you provide examples?
When adding a new test, given that promtool will return the precise failure message, users can simply copy and paste it (if they agree with it).

Moreover, we should clarify that we do not guarantee the stability of these messages, which might make these checks challenging to maintain.
Also, users might not need to perform such checks and can trust promql to emit a good failure message for the failing step, it's up to promql's tests to ensure this.
Could tell us more about the use cases of these fail message checks?

charleskorn · 2024-05-20T01:26:00Z

However, an exact match isn't ideal for every case - sometimes a regex is more convenient or clearer (for instance, if only parts of the error message are significant).

Could you provide examples? When adding a new test, given that promtool will return the precise failure message, users can simply copy and paste it (if they agree with it).

Moreover, we should clarify that we do not guarantee the stability of these messages, which might make these checks challenging to maintain. Also, users might not need to perform such checks and can trust promql to emit a good failure message for the failing step, it's up to promql's tests to ensure this. Could tell us more about the use cases of these fail message checks?

The use case for this is not for promtool test, but for testing the PromQL engine itself and verifying that the error messages returned in failure cases are as expected.

machine424 · 2024-05-25T13:48:09Z

The use case for this is not for promtool test, but for testing the PromQL engine itself and verifying that the error messages returned in failure cases are as expected.

In that case, I think we can go with exact matching if we want to have such safeguards in our tests and let's see if they're easy to maintain.

charleskorn · 2024-05-27T01:07:41Z

The use case for this is not for promtool test, but for testing the PromQL engine itself and verifying that the error messages returned in failure cases are as expected.

In that case, I think we can go with exact matching if we want to have such safeguards in our tests and let's see if they're easy to maintain.

Unfortunately exact matching isn't always enough for my use case: I am testing Mimir's replacement PromQL engine, and it doesn't guarantee that error messages are identical to Prometheus' engine.

For example, in some cases, Mimir's engine includes additional information in error messages, and in other cases, implementation differences mean we iterate through series in a different order and so different series are used for the error message.

beorn7 · 2024-05-30T10:44:40Z

I'm currently reviewing #13662 (which @zenador mentioned above as another PR that needs assertions on returned errors (in this case annotations)). Since we use this for both the internal PromQL testing framework and the user-facing rule unit tests, I would tend towards making it flexible, even if the only use case right now is in Mimir. Every new feature in PromQL will ideally be testable in the framework and needs to extend it (as seen with native histograms), so erring on the side of making it more flexible is probably less hassle in the long run.

beorn7 · 2024-05-30T10:49:22Z

Note that https://github.com/prometheus/prometheus/blob/main/docs/configuration/unit_testing_rules.md needs to be updated with the changes here.

machine424

Ok, thanks to you two for the details.
Let's give this a try ;)

beorn7 · 2024-05-30T12:50:30Z

@charleskorn could you update https://github.com/prometheus/prometheus/blob/main/docs/configuration/unit_testing_rules.md accordingly? Then we can merge this, and then #13662 can be adjusted on top of this.

charleskorn · 2024-05-31T04:58:25Z

@charleskorn could you update https://github.com/prometheus/prometheus/blob/main/docs/configuration/unit_testing_rules.md accordingly? Then we can merge this, and then #13662 can be adjusted on top of this.

Sorry, I think I'm missing something - those docs don't talk about asserting that a query fails, so I'm not sure where I'd add documentation for asserting on the failure message.

beorn7 · 2024-06-03T12:21:00Z

Maybe that was an omission even before. Or maybe you cannot use the query fail assertions with the promtool rules test?
But I hope you can. Essentially, the unit test documentation is also used if you write tests in the testing framework.

Sorry for not being able to be more specific, I'm not a domain expert here. I would propose the following steps:

Find out if the rule testing via promtool is capable of utilizing the query failure assertion.
If so, add documentation to the promtool rule test documentation, so everyone can use it.
If not, we have to find another way of documenting how to assert query failures.

charleskorn · 2024-06-05T03:25:38Z

Maybe that was an omission even before. Or maybe you cannot use the query fail assertions with the promtool rules test?

As far as I can see, promtool unittest does not support asserting that queries fail.

I'll add a separate doc explaining the test scripting language and its features.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

charleskorn · 2024-06-05T04:05:23Z

I'll add a separate doc explaining the test scripting language and its features.

Added: https://github.com/charleskorn/prometheus/blob/assert-on-query-error/promql/promqltest/README.md

beorn7

Very nice. Thank you very much.

charleskorn requested a review from roidelapluie as a code owner May 2, 2024 02:48

machine424 reviewed May 7, 2024

View reviewed changes

charleskorn requested a review from machine424 May 8, 2024 06:43

charleskorn added 5 commits May 10, 2024 10:18

Add ability to assert that a query fails with a particular error message

b39db16

Signed-off-by: Charles Korn <charles.korn@grafana.com>

Address PR feedback: rename expected_message to `expected_fail_mess…

2f7b9fd

…age` Signed-off-by: Charles Korn <charles.korn@grafana.com>

Address PR feedback: extract common logic to single method

f1b19dc

Signed-off-by: Charles Korn <charles.korn@grafana.com>

Add support for matching regex patterns too.

1748fcb

Signed-off-by: Charles Korn <charles.korn@grafana.com>

Rename expected_fail_pattern to expected_fail_regexp

78861fe

Signed-off-by: Charles Korn <charles.korn@grafana.com>

charleskorn force-pushed the assert-on-query-error branch from 199ff32 to 78861fe Compare May 10, 2024 00:19

charleskorn mentioned this pull request May 10, 2024

Streaming PromQL engine: binary arithmetic operations with one-to-one matching grafana/mimir#8096

Merged

2 tasks

machine424 approved these changes May 30, 2024

View reviewed changes

beorn7 mentioned this pull request May 30, 2024

Native histograms custom buckets storage #13662

Merged

Add documentation for test scripting language.

276f302

Signed-off-by: Charles Korn <charles.korn@grafana.com>

beorn7 approved these changes Jun 6, 2024

View reviewed changes

beorn7 merged commit 1f988f7 into prometheus:main Jun 6, 2024

charleskorn mentioned this pull request Jun 11, 2024

Mimir query engine: extend binary operation tests to assert on error messages grafana/mimir#8340

Merged

2 tasks

promql: extend test scripting language to support asserting on expected error message #14038

promql: extend test scripting language to support asserting on expected error message #14038

Uh oh!

Conversation

charleskorn commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

machine424 May 7, 2024

Choose a reason for hiding this comment

Uh oh!

charleskorn May 8, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

beorn7 commented May 8, 2024

Uh oh!

zenador commented May 8, 2024

Uh oh!

bboreham commented May 8, 2024

Uh oh!

charleskorn commented May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charleskorn commented May 10, 2024

Uh oh!

charleskorn commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

machine424 commented May 17, 2024

Uh oh!

charleskorn commented May 20, 2024

Uh oh!

machine424 commented May 25, 2024

Uh oh!

charleskorn commented May 27, 2024

Uh oh!

beorn7 commented May 30, 2024

Uh oh!

beorn7 commented May 30, 2024

Uh oh!

machine424 left a comment

Choose a reason for hiding this comment

Uh oh!

beorn7 commented May 30, 2024

Uh oh!

charleskorn commented May 31, 2024

Uh oh!

beorn7 commented Jun 3, 2024

Uh oh!

charleskorn commented Jun 5, 2024

Uh oh!

charleskorn commented Jun 5, 2024

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charleskorn commented May 2, 2024 •

edited

Loading

charleskorn commented May 9, 2024 •

edited

Loading

charleskorn commented May 17, 2024 •

edited

Loading