-
Notifications
You must be signed in to change notification settings - Fork 330
Closed
Description
I'm running Prometheus Operator 0.71.2 with Prometheus 2.49.1 on EKS
I have metric endpoints protected by TLS cert and key. Teleport Tbot rotates the cert and key every n hours and writes them to a secret. There's a Probe resource that refers to that secret. Prometheus Operator loads the Probe into a Prometheus instance and rewrites the secret for that instance. Prometheus uses the rewritten secret to access the endpoint
What I'm seeing is that:
- Prometheus fails to reload the cert and key and hits a 403 Forbidden for either a couple hours or indefinitely after a cert rotation
- Triggering a config reload does not reload the cert and key
- Sending a SIGHUP to the Prometheus process does not reload the cert and key
- Sending a SIGTERM to the Prometheus process does reload the cert and key by restarting that pod
The secrets look up to date on the Prometheus pod filesystem during the issue
Probe definition:
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: probe-foo
namespace: monitoring
spec:
interval: 30s
jobName: probe-foo
prober:
path: /metrics
scheme: https
url: foo-exporter.access-proxy.example.com
scrapeTimeout: 20s
targets:
staticConfig:
static:
- foo:443
tlsConfig:
cert:
secret:
key: tlscert
name: tbot-prometheus-foo
insecureSkipVerify: true
keySecret:
key: key
name: tbot-prometheus-foo
Generated config:
- job_name: probe/monitoring/probe-foo
honor_timestamps: true
track_timestamps_staleness: false
scrape_interval: 30s
scrape_timeout: 20s
scrape_protocols:
- OpenMetricsText1.0.0
- OpenMetricsText0.0.1
- PrometheusText0.0.4
metrics_path: /metrics
scheme: https
enable_compression: true
tls_config:
cert_file: /etc/prometheus/certs/secret_monitoring_tbot-prometheus-foo_tlscert
key_file: /etc/prometheus/certs/secret_monitoring_tbot-prometheus-foo_key
insecure_skip_verify: true
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: job
replacement: office-metrics-foo
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
target_label: __param_target
replacement: $1
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: __address__
replacement: foo-exporter.access-proxy.example.com
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
static_configs:
- targets:
- foo:443
labels:
namespace: monitoring
This sounds similar to #345 but still happening today
Metadata
Metadata
Assignees
Labels
No labels