-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Description
Proposal
Created Timestamp (CT) are relatively new concept. They generally work with (at least):
- client_golang + Prometheus proto exposition and OM text (opt-in, due to _created line conflicts)
- Java and Python OM text too (tbd if proto too).
- ingesting OTLP.
Nevertheless it will take time for all clients to adopt CTs and pass those along for Prometheus. For some clients e.g. old solutions using Prometheus Text exposition or complex exporters (e.g. cadvisor) where you need to think carefully about CT correctness, those might never adopted or on even longer timeframes.
Proposal
This proposal allows "auto-generating" CTs for all counter-semantics metrics, which unblocks accurate reset counting (e.g. with zero sample injection feature we have) or OTLP/PRW 2.0 CT use cases (once we solve CTs on metadata).
This is generally not as easy as it seems. E.g. common attempts like faking CT e.g. 1ms before scraped timestamp can be damaging and likely create in-accurate results e.g. counter over-adding (assuming resets when there was none). Same if we would use the timestamp of when the scrape loop/service time sees the target for the first time. Taking process start time is not a too bad solution, but it's not always present information (apps have to present that), it's not cheap to find (required worse case full parsing of scrape format) and it's does not work for exporters/counters that reset mid-process.
Fortunately one solid algorithm got invented while back and is actively used (at least) at Google cloud in the GMP Prometheus fork and opentelemetry-collector-contrib/googlemanagedprometheusexporter
(actual code for this is here).
Algorithm
- If counter sample has CT from the instrumentation use that.
- For the first counter sample, buffer it's value and timestamp, but not append (let's call those
first.value
andfirst.ts
) -- code. - For the next counter sample for the same series (
next.value
andnext.ts
):
a. Ifnext.value < first.value
this means reset happens in between. Append(next.value, next.ts)
withcreated_timestamp = next.ts-1ms
as we don't know the exact time, but we know it's betweenfirst.ts
andnext.ts
. -- code
b. otherwise append(next.value - first.value, next.ts)
withcreated_timestamp = first.ts
-- code.
Consequences
- Valid StartTime from the perspective of appended counter-like metric.
- Rate/increases/reset function gives accurate (mathematically) results.
Trade-offs:
- (!) Correct absolute counter value is lost on the collector/scraper/Prometheus restart (unless we cache things) or when target is down for longer time.
- First observed sample is lost too in the above scenarios (e.g. if target is scraped only once, you don't get any sample).
- We have to buffer a few floats/ints (3?) for every counter-like metric for the duration of its life (some overhead).
Part of: #14217