-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Description
Proposal
Right now, the fact that PromQL does not expose first class access to time series timestamps means that anything you do with timestamps requires using functions and thus subqueries, which makes them quite inefficient despite the fact that the data is directly present in the TSDB, directly loaded by Prometheus, and so on.
Suppose that you have a long retention time and a metric that tracks per-something usage; per-user VPN usage, for example. When a user is using the VPN, they have a time series with an appropriate label for their usage; when they're not connected, there is no time series. Such intermittent time series is a very common metric pattern. If you want to query the maximum number of connections that a user has had over a time range (and also determine if they've used the VPN at all), this is very efficient: max_over_time(yourmetric{user="..."}[730d])
. Prometheus will evaluate this (for us) in nothing flat. However, if you want to know the earliest or the latest that a time series for a user was present, you must use timestamp()
, which forces a subquery, which is far less efficient. When you run max_over_time( timestamp(yourmetric{user="..."})[730d:1m] )
, Prometheus will sit there grinding away for a significant amount of time.
This inefficiency is unnecessary. The timestamps exist beside the values in the TSBD, and the core time series scan that Prometheus is doing for max_over_time()
(and other *_over_time
functions) is loading both the values and their timestamps from the TSDB. If there was a way to access the timestamp at that point our query for maximum or minimum timestamp would be just as efficient as the query for the maximum or minimum value. Only PromQL's limitations are making this inefficient.
I don't have any particular ideas on how this should be solved. One brute force solution would be some form of timestamp_over_time
function that transformed range vectors from being range vectors of values to range vectors of timestamps. Another solution would be a way to mark metrics as yielding their timestamp instead of their value. @
is already taken as a syntax character, otherwise you might write @yourmetric{user="..."}
to mean that you wanted the timestamp of the time series instead of its value (which could be used in both instant vectors and range vectors).
Metadata
Metadata
Assignees
Type
Projects
Status