Skip to content

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented May 28, 2025

The new approach simplifies metric naming by consolidating zone-based metrics generated dynamically with static CPU (device) specific ones with zone labels.

Pros:

  • This reduces cardinality, improving queryability, and simplifies implementation.
  • Dashboards will no longer need any label_replace hacks to query by selected zones as zone is now a label.
  • Simplifies queries. E.g.
sum by(container_id, zone) (kepler_process_cpu_watts{job="host", container_id!=""}) - 
sum by(container_id, zone) (kepler_container_cpu_watts{job="host"})

Validates if the sum of process watts in container for is the same as the container's power use for each available zone

  • Processors (ARM?) that do not rapl now need not be shoehorned into kepler_node_cpu_core|package_

Changes:

  • Replaced dynamic zone-based metric descriptors with static CPU (device) specific descriptors. E.g. kepler_node_package_joules_total -> kepler_node_cpu_joules_total{zone="package"}

  • Consolidated metrics under kepler_<level>_cpu_<unit> with zone labels

  • Removed kepler_node_energy_zone metric as it is no longer required.

  • Added helper functions for descriptor creation

  • Use consistent label names for "zone", "container_id" to ease with writing queries.

@github-actions github-actions bot added the refactor Code refactoring without changing functionality label May 28, 2025
@sthaha sthaha force-pushed the refactor-metric-name-change branch from cfcc7c4 to 8141311 Compare May 28, 2025 03:16
Copy link

codecov bot commented May 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.10%. Comparing base (5f89a3e) to head (a6a1822).
Report is 3 commits behind head on reboot.

Additional details and impacted files
@@            Coverage Diff             @@
##           reboot    #2105      +/-   ##
==========================================
- Coverage   92.22%   92.10%   -0.13%     
==========================================
  Files          30       30              
  Lines        2226     2140      -86     
==========================================
- Hits         2053     1971      -82     
- Misses        131      134       +3     
+ Partials       42       35       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sthaha sthaha marked this pull request as ready for review May 28, 2025 03:19
…trics

The new approach simplifies metric naming by consolidating zone-based metrics
generated dynamically with static  CPU (device) specific ones with zone labels.
This reduces cardinality, improving queryability, and minimizes runtime complexity.

Changes:
- Replaced dynamic zone-based metric descriptors with static CPU (device) specific descriptors.
  E.g. `kepler_node_package_joules_total` -> `kepler_node_package_joules_total{zone="package"}`

- Consolidated metrics under `kepler_<level>_cpu_<unit>` with zone labels
- Removed `kepler_node_energy_zone` metric as it is no longer required.
- Added helper functions for descriptor creation
- Use consistent label names for "zone", "container_id" to ease with
  writing queries.

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@sthaha sthaha force-pushed the refactor-metric-name-change branch from 8141311 to 817785f Compare May 28, 2025 03:24
@sthaha sthaha changed the title refactor(metrics): simplify power metrics by consolidating zone-based… feat!(metrics): simplify power metrics by consolidating zone-based metrics May 28, 2025
@github-actions github-actions bot removed the refactor Code refactoring without changing functionality label May 28, 2025
@sthaha sthaha added the feat A new feature or enhancement label May 28, 2025
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@github-actions github-actions bot added chore Routine tasks or maintenance and removed feat A new feature or enhancement labels May 28, 2025
@sthaha sthaha requested review from vimalk78 and vprashar2929 May 28, 2025 05:29
@vprashar2929 vprashar2929 added feat A new feature or enhancement and removed chore Routine tasks or maintenance labels May 28, 2025
@vprashar2929
Copy link
Collaborator

Tested against OpenShift:

Comparison between reboot v0.0.6 vs dev

Metrics-05-28-2025_01_30_PM

"instant": false,
"legendFormat": "Kepler Reboot Node (${zones})",
"legendFormat": "Kepler Reboot Node (${zone}) (ΔJ/s)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greek letter 'delta" ?

"kepler_node_package_joules_total",
"kepler_node_package_watts",
"kepler_node_dram_joules_total",
"kepler_node_dram_watts",
"kepler_node_energy_zone",
"kepler_node_cpu_joules_total",
"kepler_node_cpu_watts",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need more tests for zone name assertions?

@sthaha sthaha merged commit 3d01253 into sustainable-computing-io:reboot May 28, 2025
10 of 11 checks passed
@vprashar2929 vprashar2929 added breaking-change Indicates changes that break backward compatibility and removed feat A new feature or enhancement labels May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Indicates changes that break backward compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants