Skip to content

prometheus_exporter sink not able to aggregate metrics over time #23519

@PrayagS

Description

@PrayagS

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I'm trying to setup the following pipeline,

logs -> log_to_metric -> prometheus_exporter

While the amount of logs flowing through the source and transform is accurate, the final metric query ends up being inaccurate. My point of reference is Loki using a sum(count_over_time(<log query>[1m])) query and I'm comparing it with the corresponding sum(increase(<metric>[1m])) Prometheus query.

The metric I'm trying to build is a simple HTTP request counter based on NGINX logs. Even after configuring flush_period_secs to a high value (1 hour) for the sink, I'm noticing that the time series (labels are status, downstream service, pod) disappear too fast. This leads to counter resets and the Prometheus query never matches up to the log query.

This was proven further when I updated the pipeline to be as follows,

logs -> log_to_metric -> statsd -> | prometheus/statsd_exporter <- scraped by Prometheus

The statsd-exporter in between is able to aggregate samples as expected and now my queries match more or less.

Configuration

acknowledgements:
  enabled: true
api:
  address: 0.0.0.0:8686
  enabled: true
data_dir: /var/lib/vector
sinks:
  vector_metrics_sink:
    address: 0.0.0.0:9598
    inputs:
    - vector_metrics
    - log_to_metric
    type: prometheus_exporter
    flush_period_secs: 3600
sources:
  vector_metrics:
    scrape_interval_secs: 30
    type: internal_metrics
  kafka_source:
    auto_offset_reset: largest
    bootstrap_servers: <REDACTED>
    commit_interval_ms: 1000
    decoding:
      codec: json
    fetch_wait_max_ms: 500
    group_id: <REDACTED>
    librdkafka_options:
      client.id: <REDACTED>
      fetch.max.bytes: "67108864"
      fetch.min.bytes: "10485760"
      max.partition.fetch.bytes: "26214400"
      message.max.bytes: "67108864"
      metadata.max.age.ms: "60000"
      topic.metadata.refresh.interval.ms: "60000"
    metrics:
      topic_lag_metric: true
    topics:
    - <REDACTED>
    type: kafka
transforms:
  log_to_metric_enrichment:
    drop_on_abort: true
    inputs:
    - kafka_source
    source: <REDACTED>
    type: remap
  log_to_metric:
    inputs:
    - log_to_metric_enrichment
    metrics:
    - field: status
      kind: incremental
      name: http_requests_total
      namespace: <REDACTED>
      tags:
        downstream_service: '{{ downstream_service }}'
        exported_pod: '{{ pod }}'
        status: '{{ status }}'
      type: counter
    type: log_to_metric

Version

vector 0.43.1 (aarch64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)

Debug Output

It's not an actual error that results in an error log or panic so I'm omitting this section. Let me know if specific trace logs are still needed. I tried it out but there's a lot of noise.

Example Data

No response

Additional Context

This vector deployment is an aggregator deployment on Kubernetes consuming logs from Kafka.

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions