Skip to content

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented May 26, 2025

Introduce support for monitoring power consumption of virtual machines (VMs) and expose those as Prometheus metrics kepler_vm_<rapl-zone>_{watts|joules_total}.

Key changes include:

  • Enhance a process type to the Process struct
  • Add VirtualMachine struct and VirtualMachines map to track VM metadata.
  • Detect VM processes using regex patterns for QEMU, KVM, and Libvirt hypervisors.
  • Enhance Process struct with Type and VirtualMachineID fields to associate processes with VMs.
  • Add VirtualMachines() method to Informer interface
  • Include mock support for VM testing in mock_utils.go.
  • Add VM power computation (in monitor) just like Container.
  • export VM prometheus metrics

NOTE: Only Qemu based VM detetion is supported. VirtualBox, VMware, and Xen hypervisor support are pending

@sthaha sthaha requested a review from vimalk78 May 26, 2025 13:44
@sthaha
Copy link
Collaborator Author

sthaha commented May 27, 2025

@vprashar2929 could you please help test this?

@github-actions github-actions bot added the chore Routine tasks or maintenance label May 27, 2025
Copy link
Collaborator

@vimalk78 vimalk78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comment. continuing to review

@sthaha sthaha force-pushed the feat-vm-power branch 3 times, most recently from 23be1a3 to c0a6f4c Compare May 28, 2025 08:37
@github-actions github-actions bot added the fix A bug fix label May 28, 2025
sthaha added 4 commits May 29, 2025 04:25
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Add support for monitoring Virtual Machines (VMs).
Key changes include:
- Enhance a process type to the Process struct
- Add `VirtualMachine` struct and `VirtualMachines` map to track VM metadata.
- Detect VM processes using regex pattern for QEMU
- Enhance `Process` struct with `Type` and `VirtualMachineID` fields to associate processes with VMs.
- Add `VirtualMachines()` method to `Informer` interface
- Include mock support for VM testing in `mock_utils.go`.

*NOTE:* Only Qemu based VMs are supported. VirtualBox, VMware, and Xen
        hypervisor support are pending

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Add VM power computation just like Container.

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@sthaha sthaha added feat A new feature or enhancement and removed fix A bug fix chore Routine tasks or maintenance labels May 29, 2025
Copy link

codecov bot commented May 29, 2025

Codecov Report

Attention: Patch coverage is 95.58499% with 20 lines in your changes missing coverage. Please review.

Project coverage is 92.55%. Comparing base (c4aa86f) to head (ca85c1b).
Report is 6 commits behind head on reboot.

Files with missing lines Patch % Lines
internal/resource/informer.go 93.45% 5 Missing and 2 partials ⚠️
internal/monitor/monitor.go 0.00% 4 Missing and 2 partials ⚠️
internal/resource/vm.go 95.23% 4 Missing ⚠️
internal/monitor/vm.go 96.87% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           reboot    #2100      +/-   ##
==========================================
+ Coverage   91.97%   92.55%   +0.57%     
==========================================
  Files          30       32       +2     
  Lines        2143     2538     +395     
==========================================
+ Hits         1971     2349     +378     
- Misses        137      150      +13     
- Partials       35       39       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions bot added fix A bug fix chore Routine tasks or maintenance and removed feat A new feature or enhancement labels May 29, 2025
@sthaha sthaha marked this pull request as ready for review May 29, 2025 10:29
@sthaha sthaha added feat A new feature or enhancement and removed fix A bug fix chore Routine tasks or maintenance labels May 29, 2025
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@github-actions github-actions bot added fix A bug fix chore Routine tasks or maintenance and removed feat A new feature or enhancement labels May 29, 2025
@sthaha sthaha added feat A new feature or enhancement and removed fix A bug fix chore Routine tasks or maintenance labels May 29, 2025
}

return nil
}

// Buffered channels prevent goroutine blocking
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: incorrect code comment

@@ -44,6 +45,11 @@ func newProcess(proc *resource.Process, zones ZoneUsageMap) *Process {
if proc.Container != nil {
process.ContainerID = proc.Container.ID
}

// Add the container ID if available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add VmID


CPUTotalTime float64 // CPU time in seconds

// Replace single Usage with ZoneUsageMap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment needed?


// Container represents the power consumption of a container
type VirtualMachine struct {
ID string // VM ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VM ID has a different meaning. better call it uuid

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need not be uuid for other hypervisors ... Say in future, we support vmware, then the unique id (how ever they generate) will still be ID in kepler. vendor specific labels should be published as <vendor>_<label> . E.g. libvirt_id if we are to ever add it.

func (pm *PowerMonitor) calculateVMPower(prev, newSnapshot *Snapshot) error {
vms := pm.resources.VirtualMachines()

// Skip if no containers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove/change comment

vmMap := make(VirtualMachines, len(vms.Running))

// For each VM, calculate power for each zone separately
for id, c := range vms.Running {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change c to vm , the type is VirtualMachine

continue
}

cpuTimeRatio := c.CPUTimeDelta / vms.NodeCPUTimeDelta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be simpler to use power usages from the vm's process power usage directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats assuming that any vm will have only a single process - right?

Comment on lines +256 to +257
if vm == nil {
panic(fmt.Sprintf("process %d of type %s has is nil virtual machine", proc.PID, proc.Type))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function panics if proc.VirtualMachine is nil, but the function's usage already invokes proc.VirtualMachine.ID

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, but I explicitly wanted to assert and log the process that puts kepler in this specific state that is invalid.

@vimalk78 vimalk78 merged commit c8220d9 into sustainable-computing-io:reboot May 29, 2025
11 checks passed
sthaha added a commit to sthaha/kepler that referenced this pull request May 29, 2025
This commit addresses comments on
sustainable-computing-io#2100

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
sthaha added a commit that referenced this pull request May 30, 2025
cleanup(vm): addresses comments from #2100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat A new feature or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants