-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
I'm trying to setup the following pipeline,
logs -> log_to_metric -> prometheus_exporter
While the amount of logs flowing through the source and transform is accurate, the final metric query ends up being inaccurate. My point of reference is Loki using a sum(count_over_time(<log query>[1m]))
query and I'm comparing it with the corresponding sum(increase(<metric>[1m]))
Prometheus query.
The metric I'm trying to build is a simple HTTP request counter based on NGINX logs. Even after configuring flush_period_secs
to a high value (1 hour) for the sink, I'm noticing that the time series (labels are status, downstream service, pod) disappear too fast. This leads to counter resets and the Prometheus query never matches up to the log query.
This was proven further when I updated the pipeline to be as follows,
logs -> log_to_metric -> statsd -> | prometheus/statsd_exporter <- scraped by Prometheus
The statsd-exporter in between is able to aggregate samples as expected and now my queries match more or less.
Configuration
acknowledgements:
enabled: true
api:
address: 0.0.0.0:8686
enabled: true
data_dir: /var/lib/vector
sinks:
vector_metrics_sink:
address: 0.0.0.0:9598
inputs:
- vector_metrics
- log_to_metric
type: prometheus_exporter
flush_period_secs: 3600
sources:
vector_metrics:
scrape_interval_secs: 30
type: internal_metrics
kafka_source:
auto_offset_reset: largest
bootstrap_servers: <REDACTED>
commit_interval_ms: 1000
decoding:
codec: json
fetch_wait_max_ms: 500
group_id: <REDACTED>
librdkafka_options:
client.id: <REDACTED>
fetch.max.bytes: "67108864"
fetch.min.bytes: "10485760"
max.partition.fetch.bytes: "26214400"
message.max.bytes: "67108864"
metadata.max.age.ms: "60000"
topic.metadata.refresh.interval.ms: "60000"
metrics:
topic_lag_metric: true
topics:
- <REDACTED>
type: kafka
transforms:
log_to_metric_enrichment:
drop_on_abort: true
inputs:
- kafka_source
source: <REDACTED>
type: remap
log_to_metric:
inputs:
- log_to_metric_enrichment
metrics:
- field: status
kind: incremental
name: http_requests_total
namespace: <REDACTED>
tags:
downstream_service: '{{ downstream_service }}'
exported_pod: '{{ pod }}'
status: '{{ status }}'
type: counter
type: log_to_metric
Version
vector 0.43.1 (aarch64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)
Debug Output
It's not an actual error that results in an error log or panic so I'm omitting this section. Let me know if specific trace logs are still needed. I tried it out but there's a lot of noise.
Example Data
No response
Additional Context
This vector deployment is an aggregator deployment on Kubernetes consuming logs from Kafka.
References
No response