Skip to content

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented May 12, 2025

This commit extends power monitoring to include process and container-level metrics alongside node-level metrics.

Changes include

  • A new resource informer to track processes and containers, calculating power consumption based on CPU time ratios.
  • Implement container detection for Docker, Containerd, CRI-O, Podman, and Kubernetes runtimes. Include tests for container detection and resource tracking.
  • Add Prometheus metrics for energy (joules) and power (watts) per RAPL zone, with labels for process and container metadata.

@github-actions github-actions bot added the feat A new feature or enhancement label May 12, 2025
@sthaha sthaha force-pushed the feat-process-power-attribution branch from adaef48 to 94e1513 Compare May 13, 2025 04:33
@github-actions github-actions bot added the chore Routine tasks or maintenance label May 13, 2025
@sthaha sthaha force-pushed the feat-process-power-attribution branch from 94e1513 to 5031cdd Compare May 13, 2025 11:48
@sthaha sthaha requested a review from vimalk78 May 14, 2025 07:07
@vprashar2929
Copy link
Collaborator

node-container diff doesn't look correct 🤔

Screenshot 2025-05-14 at 4 36 54 PM

@sthaha sthaha force-pushed the feat-process-power-attribution branch 2 times, most recently from 1ab11d9 to 0082d73 Compare May 14, 2025 11:57
@sthaha
Copy link
Collaborator Author

sthaha commented May 14, 2025

node-container diff doesn't look correct 🤔

What does node - container diff do? And how is it incorrect 🤔 ?

@vprashar2929
Copy link
Collaborator

What does node - container diff do? And how is it incorrect 🤔 ?

abs(
  sum(kepler_node_${zones}_watts{job="${job}"}) 
  - 
  sum(kepler_container_${zones}_watts{job="${job}"})
)

I was expecting the difference to be 0 or <0 (as in no difference, like how it is in node-process) as per what old Kepler used to produce

@sthaha sthaha force-pushed the feat-process-power-attribution branch from ee93e82 to 198556a Compare May 14, 2025 23:52
@vimalk78
Copy link
Collaborator

can we reduce the number of files in testdata? do we need all the files there?

autogroup
auxv
cgroup
cmdline
comm
coredump_filter
cpu_resctrl_groups
environ
gid_map
ksm_merging_pages
ksm_stat
latency
mem
mountinfo
mounts
numa_maps
oom_adj
oom_score_adj
patch_state
personality
projid_map
sched
schedstat
setgroups
stack
stat
timens_offsets
timerslack_ns
uid_map

@sthaha sthaha force-pushed the feat-process-power-attribution branch 2 times, most recently from 600317a to cc818f3 Compare May 15, 2025 10:26
Copy link

codecov bot commented May 15, 2025

Codecov Report

Attention: Patch coverage is 89.48598% with 90 lines in your changes missing coverage. Please review.

Project coverage is 91.75%. Comparing base (1f9f878) to head (aca34eb).
Report is 5 commits behind head on reboot.

Files with missing lines Patch % Lines
internal/resource/informer.go 85.54% 16 Missing and 8 partials ⚠️
internal/monitor/monitor.go 13.04% 14 Missing and 6 partials ⚠️
internal/monitor/mock_utils.go 90.55% 12 Missing ⚠️
internal/resource/procfs_reader.go 73.91% 8 Missing and 4 partials ⚠️
...l/exporter/prometheus/collector/power_collector.go 94.11% 4 Missing and 4 partials ⚠️
internal/monitor/container.go 92.85% 5 Missing and 2 partials ⚠️
internal/resource/options.go 81.81% 4 Missing ⚠️
internal/resource/container.go 95.58% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           reboot    #2061      +/-   ##
==========================================
- Coverage   93.30%   91.75%   -1.56%     
==========================================
  Files          21       29       +8     
  Lines        1240     2086     +846     
==========================================
+ Hits         1157     1914     +757     
- Misses         66      130      +64     
- Partials       17       42      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sthaha sthaha force-pushed the feat-process-power-attribution branch from e8fa91e to f65aeb8 Compare May 15, 2025 13:30
@sthaha sthaha marked this pull request as ready for review May 15, 2025 13:30
@github-actions github-actions bot removed the chore Routine tasks or maintenance label May 15, 2025
@sthaha sthaha requested a review from vimalk78 May 15, 2025 13:38
@sthaha sthaha force-pushed the feat-process-power-attribution branch from f65aeb8 to 49009e6 Compare May 16, 2025 00:05
sthaha added 4 commits May 16, 2025 04:24
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@sthaha sthaha force-pushed the feat-process-power-attribution branch from 49009e6 to aca34eb Compare May 16, 2025 08:24
@vimalk78 vimalk78 merged commit b02bd54 into sustainable-computing-io:reboot May 16, 2025
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat A new feature or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants