Skip to content

[GEP-19] Migrate monitoring stack to prometheus-operator #9065

@rfranzke

Description

@rfranzke

How to categorize this issue?

/area dev-productivity monitoring
/kind enhancement

What would you like to be added:
The monitoring stack should be migrated from the current custom-built Helm charts to the prometheus-operator as proposed in GEP-19.

Why is this needed:
GEP-19 has been accepted and merged a long while ago, hence we should strive for completing its implementation. Also, the garden cluster (managed via gardener-operator, ref #7016) does not have a monitoring stack yet. Also, this increases the development productivity by cleaning up technical debt and improving the code.

Tasks:


General notes for the migration (taken from #6319):

  • Add temporary migration code for the Persistent volume. This ensures that no data is lost.
    1. Find the "old" pvc and its pv and set persistentVolumeReclaimPolicy=Retain.
    2. Delete the "old" pvc.
    3. Create a Prometheus Object with a volumeClaimTemplate that references the pv with volumeName=<existing-pv>
    4. Migrate the data using an init container
    5. Remove the migration code after 1-2 releases
  • Add all existing prometheus configuration to an additionalScrapeConfig. This will allow us to switch to the prometheus-operator without creating PodMonitors and ServiceMonitors for each component and instead do that migration step by step.
  • Add all extension prometheus configuration to the same additionalScrapeConfig. This will allow extensions time to migrate as well.
  • Existing rules should be replaced with PrometheusRules.
  • Once all of these steps are completed, most of the configuration in the additionalScrapeConfig can be migrated to PodMonitors and ServiceMonitors.

Metadata

Metadata

Assignees

Labels

area/dev-productivityDeveloper productivity related (how to improve development)area/ipceiIPCEI (Important Project of Common European Interest)area/monitoringMonitoring (including availability monitoring and alerting) relatedkind/enhancementEnhancement, improvement, extensionkind/epicLarge multi-story topic

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions