Skip to content

Conversation

Apokleos
Copy link
Contributor

We're introducing default_gpus and default_gpu_model as GPU annotations for kata VM configurations to improve instance selection on remote hypervisors.
By adding these annotations:

  • default_gpus: Allows us to specify the minimum number of GPUs a VM requires. This ensures that the remote hypervisor selects an instance with at least that many GPUs, preventing resource under-provisioning.
  • default_gpu_model: Lets us define the specific GPU model needed for the VM. This is crucial for workloads that depend on particular GPU architectures or features, ensuring compatibility and optimal performance.
    Essentially, these new fields provide the remote hypervisor with the necessary intelligence to select the most appropriate instance for a given GPU VM.

Signed-off-by: alex.lyn alex.lyn@antgroup.com

Apokleos added 3 commits June 26, 2025 17:27
To provide the remote hypervisor with the necessary intelligence
to select the most appropriate instance for a given GPU instance,
leading to better resource allocation, two fields `default_gpus`
and `default_gpu_model` are introduced in `RemoteInfo`.

Fixes kata-containers#10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Two annotations: `default_gpus and `default_gpu_model` as GPU annotations
are introduced for Kata VM configurations to improve instance selection on
remote hypervisors. By adding these annotations:
(1) `default_gpus`: Allows users to specify the minimum number of GPUs a VM
requires. This ensures that the remote hypervisor selects an instance
with at least that many GPUs, preventing resource under-provisioning.
(2) `default_gpu_model`: Lets users define the specific GPU model needed for
the VM. This is crucial for workloads that depend on particular GPU archs or
features, ensuring compatibility and optimal performance.

Fixes kata-containers#10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Add GPU specific annotations used by remote hypervisor for instance
selection during `prepare_vm`.

Fixes kata-containers#10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
@Apokleos Apokleos force-pushed the remote-annotation branch from e415e0a to 8c294b9 Compare June 26, 2025 09:27
Enable GPU annotations by adding `default_gpus` and `default_gpu_model`
into the list of valid annotations `enable_annotations`.

Fixes kata-containers#10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me. Thanks @Apokleos

Copy link
Contributor

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@Apokleos Apokleos marked this pull request as ready for review June 30, 2025 06:04
@Apokleos Apokleos merged commit e66baf5 into kata-containers:main Jun 30, 2025
507 of 540 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants