Skip to content

Conversation

qinxx108
Copy link
Contributor

This is to implement the proposal here: #12120
There is 3 parameters added:

MaxAlerts: Limit number of alerts returned for each rules, if there is more alerts for a rule, we will return hasMore = true
MaxRuleGroups: Limit number of rule groups returned, if there is more rule groups to return, we will return a NextToken in the response
NextToken: The paginate token

@qinxx108 qinxx108 force-pushed the paginated-list-rules branch 2 times, most recently from 8617a47 to 9dfdb47 Compare April 30, 2024 21:45
Signed-off-by: Yijie Qin <qinyijie@amazon.com>
@beorn7
Copy link
Member

beorn7 commented May 8, 2024

I tagged people that have discussed #12120 as reviewers.

@beorn7
Copy link
Member

beorn7 commented May 22, 2024

@roidelapluie @codesome @gotjosh @juliusv could you have a look? Or at least let us know if you have noticed this PR, but you don't have time to get to it so that less qualified reviewers might give it a try?

@codesome
Copy link
Member

I won't have time to review this PR and I don't feel qualified enough for this part of the code.

@codesome codesome removed their request for review May 31, 2024 17:17
Copy link
Member

@juliusv juliusv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got around to taking a closer look at this one! Overall the approach looks pretty good to me, I left a number of questions and comments about the details though.

It's of course not perfect in the face of rule reloads happening while you paginate, but from reading #12120 I think that is understood.

Once the design + implementation settles, we'll also need https://github.com/prometheus/prometheus/blob/main/docs/querying/api.md#rules to be updated with documentation about the new functionality.

maxAlert, err := strconv.ParseInt(r.URL.Query().Get("maxAlerts"), 10, 32)
if err != nil || maxAlert < 0 {
return nil, &parsePaginationError{
err: fmt.Errorf("maxAlerts need to be a valid number and larger than 0: %w", err),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it says that maxAlerts needs to be larger than 0, but the check above only checks that it's at least 0. Which one is intended?

If you want to allow 0, then maybe strconv.ParseUInt() would be better. Then you can omit the explicit maxAlert < 0 check completely as that's covered by the function.


if r.URL.Query().Get("maxRuleGroups") != "" {
maxRuleGroups, err := strconv.ParseInt(r.URL.Query().Get("maxRuleGroups"), 10, 32)
if err != nil || maxRuleGroups < 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@@ -1515,6 +1607,77 @@ func parseExcludeAlerts(r *http.Request) (bool, error) {
return excludeAlerts, nil
}

func parseListRulesPaginationRequest(r *http.Request) (*listRulesPaginationRequest, *parsePaginationError) {
var (
returnMaxAlert = int32(-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency:

Suggested change
returnMaxAlert = int32(-1)
returnMaxAlerts = int32(-1)

func parseListRulesPaginationRequest(r *http.Request) (*listRulesPaginationRequest, *parsePaginationError) {
var (
returnMaxAlert = int32(-1)
returnMaxRuleGroups = int32(-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably just use uint64 for these, even if we never expect to exceed an int32. Just feels more natural nowadays, avoids parsing issues when a user sets a ridiculously high limit, and you need to have one less cast (if changing to strconv.ParseUInt() below.

parameter: "maxAlerts",
}
}
returnMaxAlert = int32(maxAlert)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need two variable names for each parameter (e.g. maxAlert and returnMaxAlerts), at least after making the variable type adjustments I mentioned above. I'd just omit the ones with the return prefix, set the three variables to their default values at the top, and then overwrite them directly from the parsed parameters. That's a bit simpler to read as well I think.

}

func TruncateGroupInfos(groupInfos []*RuleGroup, maxRuleGroups int) ([]*RuleGroup, string) {
resultNumber := 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity I'd probably omit this extra tracking variable and just use len(returnGroupDescs) instead in both places where it's needed.

continue
}

// Return the next token if there is more aggregation group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Return the next token if there is more aggregation group
// Return the next token if there are more aggregation groups

return nil, nil
}

func TruncateGroupInfos(groupInfos []*RuleGroup, maxRuleGroups int) ([]*RuleGroup, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now reading this function again after adding the other comments about it, I think it'd make sense to write it in a different way that would make it simpler and more efficient:

  • Do a special-case early exit for the maxRuleGroups < 0 || len(groupInfos) <= maxRuleGroups case (just return groupInfos as is and "" for the token).
  • Then: Instead of looping over all groups and appending them individually, just return groupInfos[:maxRuleGroups] for the rule groups and the token for groupInfos[maxRuleGroups-1].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that we need to truncate the result of filtering the rule groups based on the file names and group names. There is a small edge case that blocks the optmized version: a list of rule groups for a page can be the last one or not depending if the subsequent rule groups are filtered out or not.

E.g.:
/rules?rule_groups[]=a1,b1,c1&maxAlerts=1

in the rule groups we have

  • a1
  • b1
  • c1
  • d1

When we are iterating over the rule groups, there is no way to know if c1 is going to produce a page with nextToke="" before we evaluate d1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, thanks for the explanation!

@@ -168,6 +169,17 @@ type apiFuncResult struct {

type apiFunc func(r *http.Request) apiFuncResult

type listRulesPaginationRequest struct {
MaxAlerts int32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd just stick to either int or int64 for these fields here, less conversions / casts needed later on and no need to artificially limit the value range.

* Reduce number of variables
* Reduce type convesion

Signed-off-by: Raphael Silva <rapphil@gmail.com>
qinxx108 and others added 5 commits June 11, 2024 12:15
* Remove maxAlerts parameter.
* Reuse existing API responses by using omitempty in some fields

Signed-off-by: Raphael Silva <rapphil@gmail.com>
* Eliminate the need to sort the rule groups.

Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
@rapphil
Copy link
Contributor

rapphil commented Jul 4, 2024

Ok, to simplify this PR I made the following changes:

  • Remove MaxAlerts parameter -> we already have the exclude_alerts parameter. This can be added in a later PR if necessary, but right now I don't think the user experience with maxAlerts is great: different output format, no way to paginate. So I think it is better to remove it.
  • Do not sort the rule groups before paginating and Instead just rely on the name of the rule groups to find the next token marker and return results in the same order that they are created by users.
  • Add optional field NextToken in the RuleDiscovery that will be present in case the response to list rules has more pages.

Things that I'm not happy right now about this PR and would like input:

  • Parameter naming conventions. What is the convention for naming API parameters? I can see that some APIs use camel case (e.g.: scrapePool) for query parameters while other uses snake case (e.g.: rule_group[]).
  • What should be the behaviour in case user pass a request with next token, but no MaxRules?
    • Should I return everything starting from the nextToken -> this makes sense because we can simple say that by default the API will return all results.
    • Use a different default, say 10.

@rapphil
Copy link
Contributor

rapphil commented Jul 4, 2024

I added a bit more unit tests.

@rapphil rapphil force-pushed the paginated-list-rules branch from 45b39d7 to b9c0c84 Compare July 15, 2024 05:43
@rapphil
Copy link
Contributor

rapphil commented Jul 15, 2024

Hi @bwplotka 👋🏽

I saw that you reviewed a similar PR recently: #10194

Would you be able to help reviewing this one as well?

Signed-off-by: Raphael Silva <rapphil@gmail.com>
@rapphil
Copy link
Contributor

rapphil commented Jul 15, 2024

I ended updating the new parameters so that they are consistent with the existing ones where we use snake case instead of camel case.

@bwplotka
Copy link
Member

Thanks, I am on PTO but in between cocktails I will review (joking on cocktails, non sober merging is not a great practice 😜).

@bwplotka
Copy link
Member

bwplotka commented Aug 7, 2024

Parameter naming conventions. What is the convention for naming API parameters? I can see that some APIs use camel case (e.g.: scrapePool) for query parameters while other uses snake case (e.g.: rule_group[]).

Nice, at least rules API is snake case, so let's keep that? https://prometheus.io/docs/prometheus/latest/querying/api/#rules

What should be the behaviour in case user pass a request with next token, but no MaxRules?
Should I return everything starting from the nextToken -> this makes sense because we can simple say that by default the API will return all results.
Use a different default, say 10.

Interesting, I think what you suggested make sense (return all from the token)

Signed-off-by: Raphael Philipe Mendes da Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
@rapphil rapphil force-pushed the paginated-list-rules branch from badfccf to 3936f2a Compare October 8, 2024 04:40
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
@rapphil rapphil force-pushed the paginated-list-rules branch from de715ac to cd75574 Compare October 8, 2024 16:26
@rapphil
Copy link
Contributor

rapphil commented Oct 8, 2024

@bwplotka Thanks for your review. The code looks a lot cleaner now and all the corners cases are being handled.

Signed-off-by: Raphael Silva <rapphil@gmail.com>
@@ -764,6 +764,8 @@ URL query parameters:
- `file[]=<string>`: only return rules with the given filepath. If the parameter is repeated, rules with any of the provided filepaths are returned. When the parameter is absent or empty, no filtering is done.
- `exclude_alerts=<bool>`: only return rules, do not return active alerts.
- `match[]=<label_selector>`: only return rules that have configured labels that satisfy the label selectors. If the parameter is repeated, rules that match any of the sets of label selectors are returned. Note that matching is on the labels in the definition of each rule, not on the values after template expansion (for alerting rules). Optional.
- `max_groups=<number>`: The `max_groups` parameter allows you to specify the maximum number of rule groups to return in a single response. If the total number of rule groups exceeds the specified `max_groups` value, the response will include a `nextToken` property. You can use the value of this `nextToken` property in subsequent requests in the `next_token` parameter to paginate over the remaining rule groups. The `nextToken` property will not be present in the final response, indicating that you have retrieved all the available rule groups. Please note that there are no guarantees regarding the consistency of the response if the rule groups are being modified during the pagination process. If a rule group is removed while you are paginating over the rule groups, an error might be raised if the removed rule group coincides with the next token.
- `next_token`: the pagination token that was returned in previous request when the `max_groups` property is set. The pagination token is used to iteratively paginate over a large number of rule groups. To use the `next_token` parameter, the `max_groups` parameter also need to be present.
Copy link
Member

@bwplotka bwplotka Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if such token will no longer exists or will be wrong? (400 will be returned, looking on code, so should we document that?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -1,5 +1,5 @@
// Copyright 2016 The Prometheus Authors
// Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Member

@bwplotka bwplotka Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an incidental change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh. fixing it.

bwplotka
bwplotka previously approved these changes Oct 17, 2024
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, thanks for improvements and sorry for delays.

I want to double check if we want to rename those params to:

group_limit
group_next_token

This is for consistency with limit params on other APIs. The group prefix is to be clear we paginate per group not rule, and to be future proof if we ever want to add per rule pagination. WDYT? 🤗

Comment on lines 1590 to 1593
if paginationRequest != nil && paginationRequest.NextToken != "" && !foundToken {
err := fmt.Errorf("invalid nextToken '%v'. were rule groups changed?", paginationRequest.NextToken)
return invalidParamError(err, "next_token")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if paginationRequest != nil && paginationRequest.NextToken != "" && !foundToken {
err := fmt.Errorf("invalid nextToken '%v'. were rule groups changed?", paginationRequest.NextToken)
return invalidParamError(err, "next_token")
}
if paginationRequest != nil && paginationRequest.NextToken != "" && !foundToken {
return invalidParamError(fmt.Errorf("invalid nextToken %; did rule groups change?", paginationRequest.NextToken), "next_token")
}

parsedMaxGroups int64 = -1
err error
)
maxGroups := r.URL.Query().Get("max_groups")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about group_limit naming instead? This might be consistent with other limit parameters in our API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one counter argument is that limit never supported tokens/pagination, but maybe it does not matter and limit would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency for the win :) . Actually can we say the APIs that have limit can support pagination in the future?

err error
)
maxGroups := r.URL.Query().Get("max_groups")
nextToken := r.URL.Query().Get("next_token")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_token is great if we will never paginate over rules (: Are we sure about this? Otherwise maybe group_next_token and group_limit would be a nice, explicit pair?

for _, grp := range ruleGroups {
if paginationRequest != nil && paginationRequest.NextToken != "" && !foundToken {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Code might actually simpler if we don't have a separate struct but groupNextToken and groupLimit variables 🙈 parse can return two vars or so (:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will try to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. it saved a few lines :)

* Eliminate need for extra structs to store pagination parameters

Signed-off-by: Raphael Silva <rapphil@gmail.com>
bwplotka
bwplotka previously approved these changes Oct 18, 2024
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@bwplotka
Copy link
Member

Before merging I wanted to have one "last" discussion to double check the hardest thing in programming: naming (:

https://cloud-native.slack.com/archives/C01AUBA4PFE/p1729235985519149

Copy link
Member

@juliusv juliusv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good now besides the last comment!

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Raphael Philipe Mendes da Silva <rapphil@gmail.com>
rapphil and others added 2 commits October 21, 2024 15:18
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Raphael Philipe Mendes da Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for prompt review changes after feedback; LGTM!

@bwplotka bwplotka merged commit d2802c6 into prometheus:main Oct 21, 2024
26 checks passed
KeyOfSpectator added a commit to AliyunContainerService/prometheus that referenced this pull request Oct 23, 2024
* master: (667 commits)
  NHCB scrape: refactor state handling and speed up scrape test (prometheus#15193)
  feat(tools): add debug printouts to rules unit testing (prometheus#15196)
  docs: add keep_firing_for in alerting rules
  api: Add rule group pagination to list rules api (prometheus#14017)
  Update scrape/scrape.go
  benchmark, rename parser omtext_with_nhcb
  goimports run
  Better docstring on test
  Remove omcounterdata.txt as redundant
  Fix failing benchmarks
  Add unit test to show that current wrapper is sub-optimal
  Rename convert_classic_histograms to convert_classic_histograms_to_nhcb
  More followup to prometheus#15164
  Follow up prometheus#15178
  Followup to prometheus#15164
  test(cmd/prometheus): speed up test execution by t.Parallel() when possible
  feat: normalize "le" and "quantile" labels values upon ingestion
  scrape: provide a fallback format (prometheus#15136)
  Disallowing configure AM with the v1 api (prometheus#13883)
  feat: ProtobufParse.formatOpenMetricsFloat: improve float formatting by using strconv.AppendFloat instead of fmt.Sprint
  ...

# Conflicts:
#	go.mod
#	go.sum
julienduchesne pushed a commit to julienduchesne/prometheus that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants