Skip to content

Conversation

vimalk78
Copy link
Collaborator

@vimalk78 vimalk78 commented Jun 3, 2025

feat(pod): added pod power

  • podInformer for fetching and caching pods
  • pod energy/power calculation
  • power collector shows pod energy/power
  • tests

Copy link

codecov bot commented Jun 6, 2025

Codecov Report

Attention: Patch coverage is 47.07233% with 461 lines in your changes missing coverage. Please review.

Project coverage is 81.66%. Comparing base (2d33e25) to head (b0c0f99).
Report is 2 commits behind head on reboot.

Files with missing lines Patch % Lines
internal/k8s/pod/mock_utils.go 14.58% 360 Missing and 9 partials ⚠️
internal/resource/informer.go 28.78% 45 Missing and 2 partials ⚠️
internal/k8s/pod/pod.go 93.28% 7 Missing and 3 partials ⚠️
config/config.go 68.96% 6 Missing and 3 partials ⚠️
internal/resource/types.go 0.00% 9 Missing ⚠️
internal/monitor/pod.go 92.63% 5 Missing and 2 partials ⚠️
internal/monitor/monitor.go 50.00% 4 Missing and 2 partials ⚠️
internal/resource/options.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           reboot    #2127       +/-   ##
===========================================
- Coverage   91.74%   81.66%   -10.08%     
===========================================
  Files          34       37        +3     
  Lines        2991     3861      +870     
===========================================
+ Hits         2744     3153      +409     
- Misses        193      633      +440     
- Partials       54       75       +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@sthaha sthaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the implementation looks good to me.
Few comments though

@vimalk78 vimalk78 force-pushed the pod-power branch 2 times, most recently from 279a24a to 6f4e06f Compare June 9, 2025 22:39
@vimalk78 vimalk78 requested a review from sthaha June 9, 2025 22:39
@vimalk78 vimalk78 marked this pull request as ready for review June 9, 2025 22:39
@vimalk78 vimalk78 changed the title WIP: pod power feat(pod): added pod power Jun 9, 2025
@vimalk78 vimalk78 force-pushed the pod-power branch 2 times, most recently from c3293bc to 96c29f7 Compare June 9, 2025 22:49
@@ -95,6 +99,7 @@ func NewPowerCollector(monitor PowerDataProvider, logger *slog.Logger) *PowerCol
zone = "zone"
cntrID = "container_id"
vmID = "vm_id"
podID = "pod_id"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that we need pod_id in kepler_container_.* as well

@github-actions github-actions bot added the feat A new feature or enhancement label Jun 10, 2025
   - podInformer for fetching and caching pods
   - pod energy/power calculation
   - power collector shows pod energy/power
   - tests

Signed-off-by: Vimal Kumar <vimal78@gmail.com>
@@ -195,6 +207,7 @@ func (c *PowerCollector) Collect(ch chan<- prometheus.Metric) {
c.collectProcessMetrics(ch, snapshot.Processes)
c.collectContainerMetrics(ch, snapshot.Containers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add pod_id to the containers so that we can "join" containers and pods

pi.manager = mgr

opts := zap.Options{
Development: true, // enables DebugLevel by default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we set this to true if current pi.logger. log-level == Debug?

@@ -311,6 +318,11 @@ func (pm *PowerMonitor) firstReading(newSnapshot *Snapshot) error {
return fmt.Errorf(vmPowerError, err)
}

// First read for pods
if err := pm.firstPodRead(newSnapshot); err != nil {
return fmt.Errorf(containerPowerError, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf(containerPowerError, err)
return fmt.Errorf(podPowerError, err)

@@ -30,6 +30,7 @@ func TestCollectionLoop(t *testing.T) {
resourceInformer := &MockResourceInformer{}
resourceInformer.SetExpectations(t, tr)
resourceInformer.On("Refresh").Return(nil)
// resourceInformer.On("Pods").Return(nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?


containersNoPod := []string{}
if ri.podInformer != nil {
for _, container := range containersRunning {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing test? In a follow up PR, could you please add test to test this functionality?

Comment on lines +226 to +237
pod := &Pod{
ID: podInfo.ID,
Name: podInfo.Name,
Namespace: podInfo.Namespace,
}
container.Pod = pod
_, seen := podsRunning[pod.ID]
// reset CPU Time of the pod if it is getting added to the running list for the first time
// in the subsequent iteration, the CPUTimeDelta should be incremented by container's CPUTimeDelta
resetCPUTime := !seen
podsRunning[pod.ID] = ri.updatePodCache(container, resetCPUTime)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing tests. Could you please add tests in a follow up PR?

@@ -237,6 +289,22 @@ func (ri *resourceInformer) Refresh() error {
ri.vms.Running = vmsRunning
ri.vms.Terminated = vmsTerminated

// Find terminated pods
totalPodsDelta := float64(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to find totalPodsDelta, do we?


// Resource usage tracking
CPUTotalTime float64 // total cpu time used by the VM so far
CPUTimeDelta float64 // cpu time used by the VM since last refresh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CPUTimeDelta float64 // cpu time used by the VM since last refresh
CPUTimeDelta float64 // cpu time used by the Pod since last refresh

Namespace string

// Resource usage tracking
CPUTotalTime float64 // total cpu time used by the VM so far
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CPUTotalTime float64 // total cpu time used by the VM so far
CPUTotalTime float64 // total cpu time used by the Pod so far

@sthaha sthaha merged commit 2e3afdf into sustainable-computing-io:reboot Jun 11, 2025
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat A new feature or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants