TaskQueue per-priority stats #611

stephanos · 2025-07-08T19:09:30Z

READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.

What changed?

Adding a breakdown of stats per-priority levels to the two APIs that return task queue stats already.

Why?

To give users insights on a per priority level.

Breaking changes

N/A

Server PR

temporalio/temporal#8030

dnr

seems straightforward and reasonable

temporal/api/workflowservice/v1/request_response.proto

cretz · 2025-07-10T13:39:32Z

temporal/api/workflowservice/v1/request_response.proto

    temporal.api.taskqueue.v1.TaskQueueStats stats = 5;

+    // Task queue stats breakdown by priority key. Only contains actively used priority keys.
+    // Only set if `report_stats` is set on the request.


It is expensive enough to do this on top of other stats such that it'd deserve its own opt-in?

No extra cost here, internally the total stats are actually just the aggregation of the per-priority stats.

cretz · 2025-07-10T13:41:37Z

temporal/api/workflowservice/v1/request_response.proto

+    // Only set if `report_stats` is set on the request.
+    // (-- api-linter: core::0140::prepositions=disabled
+    //     aip.dev/not-precedent: "by" is used to clarify the keys and values. --)
+    map<int32, temporal.api.taskqueue.v1.TaskQueueStats> stats_by_priority = 6;


Will this contain all priority keys ever used and/or is there some max count and/or is it only priority keys used recently?

It's difficult to summarize that in a sentence or two; but generally speaking, it will contain all "recently" used priority keys plus the ones from any backlogs. So the first part is fairly transient and wouldn't survive a reboot; but the second part is durable. So it depends. (none of this is different from the regular stats we already have)

I think the best short description I could give: a priority key will have stats if ...

it was active within the last 30s (unless there was a server restart)

it has accumulated a backlog

Not sure I'd want to commit to the exact number there (30s) though as that's an internal detail. Hence the (admittedly) fuzzy "actively used" wording.

Thanks for the info, I do like the new "activity used" wording and it makes sense to me to be vague. My only remaining concern is if we want to add an extreme max (or are we ok returning millions here?). But we don't have to decide that in the API part, all good.

In the current implementation, won't it be all the ones that have ever been used? Or do you filter them out?

(I would like to eventually "GC" subqueue ids that aren't in active use, but probably not soon.)

Oh, you're right. Even priorities that have not accumulated a backlog but where actively used once will be returned, too (until a server reboot).

temporal/api/workflowservice/v1/request_response.proto

cretz · 2025-07-10T13:43:04Z

temporal/api/workflowservice/v1/request_response.proto

+    // Only set if `report_stats` is set on the request.
+    // (-- api-linter: core::0140::prepositions=disabled
+    //     aip.dev/not-precedent: "by" is used to clarify the keys and values. --)
+    map<int32, temporal.api.taskqueue.v1.TaskQueueStats> stats_by_priority = 6;


Is there any way for me as a user to even see the keys that might make up this list (i.e. just see the priority keys in use) without necessarily asking for stats on them? Is it reasonable to want to describe things without asking for their stats?

No, this stats field is really the only way to find the in-use priority keys across the task queue.

Describing just the priority keys has the same cost as asking for the stats (ie non-negligible, requiring opt-in). And at that point, what are you really gaining by adding yet another boolean flag and response field?

If the only way to list priority keys is to also get there stats, there is no need for another field. Just can be a bit confusing to a user that sometimes they need to ask for stats just to check existence of something.

I don't think asking for the just set of priority keys that have been used (ever or recently) is that useful.. what's a use case you're thinking of?

To know what priority keys exist, same as wanting to know what task queues exist (though I understand we struggle to provide that to users too). One such situation may be to describe the configuration of a priority key (assuming we have that which I think I saw). In general if we have a "thing" in Temporal, it is often worth being able to describe it (and bonus for being able to list all of them). Just not sure if "stats" makes sense as a "describe" equivalent. But seeing how priority key configuration is just config on the task queue, I can see it not being a top-level thing.

The priority key is just a number from 1 to 5. We have at most 5 of them, and we don't have per-priority-key configuration. It's not really a "thing".

(We did consider per-priority-key rate limits as configuration but it doesn't make much sense, we can address those needs a better way)

## What changed? More granualar task queue stats tracking by priority level. ## Why? Preparation for temporalio/api#611 where we intend to expose the priority-level stats to users. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [x] covered by existing tests - [ ] added new unit test(s) - [ ] added new functional test(s) --------- Co-authored-by: David Reiss <dnr@dnr.im>

## What changed? Adding a breakdown of stats per-priority levels to the two APIs that return task queue stats already. ## Why? New API is defined here: temporalio/api#611. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [ ] covered by existing tests - [ ] added new unit test(s) - [x] added new functional test(s)

stephanos force-pushed the tq-pri-stats branch 2 times, most recently from c052655 to 37c774d Compare July 9, 2025 15:33

dnr reviewed Jul 9, 2025

View reviewed changes

temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved

temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved

This was referenced Jul 9, 2025

Track task queue stats per priority temporalio/temporal#8027

Merged

TaskQueue per-priority stats in API temporalio/temporal#8030

Merged

stephanos marked this pull request as ready for review July 10, 2025 03:42

stephanos requested review from a team as code owners July 10, 2025 03:42

cretz reviewed Jul 10, 2025

View reviewed changes

cretz approved these changes Jul 10, 2025

View reviewed changes

dnr approved these changes Jul 10, 2025

View reviewed changes

stephanos added 3 commits July 15, 2025 12:19

TaskQueue per-priority stats

8bdb476

update

e33d947

rename to stats_by_priority_key

7d7b6ce

stephanos force-pushed the tq-pri-stats branch from 85a83fe to 7d7b6ce Compare July 15, 2025 19:22

stephanos and others added 2 commits July 15, 2025 12:23

merge

3dc871d

Merge branch 'master' into tq-pri-stats

9fd99f0

stephanos merged commit 2c64a5d into temporalio:master Jul 22, 2025
5 checks passed

stephanos deleted the tq-pri-stats branch July 22, 2025 20:06

TaskQueue per-priority stats #611

TaskQueue per-priority stats #611

Uh oh!

Conversation

stephanos commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cretz Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

stephanos Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

stephanos Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephanos Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnr Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephanos Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cretz Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

stephanos Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

dnr Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

cretz Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnr Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stephanos commented Jul 8, 2025 •

edited

Loading

stephanos Jul 10, 2025 •

edited

Loading

stephanos Jul 10, 2025 •

edited

Loading

stephanos Jul 10, 2025 •

edited

Loading

cretz Jul 10, 2025 •

edited

Loading

dnr Jul 10, 2025 •

edited

Loading

stephanos Jul 10, 2025 •

edited

Loading

cretz Jul 10, 2025 •

edited

Loading