Skip to content

[GEP-19] Initial Shoot Monitoring migration #6319

@wyb1

Description

@wyb1

How to categorize this issue?

/area monitoring
/kind enhancement

What would you like to be added:

Requires #6318 as a prerequisite

Goal: Replace the Prometheus in each control plane with a Prometheus object from the prometheus-operator.

  • Replace Prometheus artifacts from the seed-monitoring charts with a component.
  • Add temporary migration code for the Persistent volume. This ensures that no data is lost.
    1. Find the "old" pvc and its pv and set persistentVolumeReclaimPolicy=Retain.
    2. Delete the "old" pvc.
    3. Create a Prometheus Object with a volumeClaimTemplate that references the pv with volumeName=<existing-pv>
    4. Migrate the data using an init container
    5. Remove the migration code after 1-2 releases
  • Add all existing prometheus configuration to an additionalScrapeConfig. This will allow us to switch to the prometheus-operator without creating PodMonitors and ServiceMonitors for each component and instead do that migration step by step.
  • Add all extension prometheus configuration to the same additionalScrapeConfig. This will allow extensions time to migrate as well.
  • Existing rules should be replaced with PrometheusRules.

Once all of these steps are completed, most of the configuration in the additionalScrapeConfig can be migrated to PodMonitors and ServiceMonitors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/monitoringMonitoring (including availability monitoring and alerting) relatedkind/enhancementEnhancement, improvement, extensionlifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions