Skip to content

Created Timestamp: Opt-in CT auto-generation globally/per scrape job. #14763

@bwplotka

Description

@bwplotka

Proposal

Created Timestamp (CT) are relatively new concept. They generally work with (at least):

  • client_golang + Prometheus proto exposition and OM text (opt-in, due to _created line conflicts)
  • Java and Python OM text too (tbd if proto too).
  • ingesting OTLP.

Nevertheless it will take time for all clients to adopt CTs and pass those along for Prometheus. For some clients e.g. old solutions using Prometheus Text exposition or complex exporters (e.g. cadvisor) where you need to think carefully about CT correctness, those might never adopted or on even longer timeframes.

Proposal

This proposal allows "auto-generating" CTs for all counter-semantics metrics, which unblocks accurate reset counting (e.g. with zero sample injection feature we have) or OTLP/PRW 2.0 CT use cases (once we solve CTs on metadata).

This is generally not as easy as it seems. E.g. common attempts like faking CT e.g. 1ms before scraped timestamp can be damaging and likely create in-accurate results e.g. counter over-adding (assuming resets when there was none). Same if we would use the timestamp of when the scrape loop/service time sees the target for the first time. Taking process start time is not a too bad solution, but it's not always present information (apps have to present that), it's not cheap to find (required worse case full parsing of scrape format) and it's does not work for exporters/counters that reset mid-process.

Fortunately one solid algorithm got invented while back and is actively used (at least) at Google cloud in the GMP Prometheus fork and opentelemetry-collector-contrib/googlemanagedprometheusexporter (actual code for this is here).

Algorithm

  1. If counter sample has CT from the instrumentation use that.
  2. For the first counter sample, buffer it's value and timestamp, but not append (let's call those first.value and first.ts) -- code.
  3. For the next counter sample for the same series (next.value and next.ts):
    a. If next.value < first.value this means reset happens in between. Append (next.value, next.ts) with created_timestamp = next.ts-1ms as we don't know the exact time, but we know it's between first.ts and next.ts. -- code
    b. otherwise append (next.value - first.value, next.ts) with created_timestamp = first.ts -- code.

Consequences

  • Valid StartTime from the perspective of appended counter-like metric.
  • Rate/increases/reset function gives accurate (mathematically) results.

Trade-offs:

  • (!) Correct absolute counter value is lost on the collector/scraper/Prometheus restart (unless we cache things) or when target is down for longer time.
  • First observed sample is lost too in the above scenarios (e.g. if target is scraped only once, you don't get any sample).
  • We have to buffer a few floats/ints (3?) for every counter-like metric for the duration of its life (some overhead).

Part of: #14217

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions