Skip to content

Incorrect GPU resource request when preferred nodes are used #1335

@chewong

Description

@chewong

Describe the bug

I have two preferred nodes with instance type Standard_NC80adis_H100_v5, which has 2 H100 each (total 4 GPUs). However, when I deployed the following workspace, the two pods generated by the managed StatefulSet are requesting 4 GPUs each:

apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
  name: workspace-llama-3-3-70b-instruct
resource:
  count: 2
  instanceType: "Standard_NC80adis_H100_v5"
  labelSelector:
    matchLabels:
      node.kubernetes.io/instance-type: Standard_NC80adis_H100_v5
inference:
  preset:
    name: llama-3.3-70b-instruct
    presetOptions:
      modelAccessSecret: hf-token
  config: "llama-inference-params"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: "llama-inference-params"
data:
  inference_config.yaml: |
    vllm:
      cpu-offload-gb: 0
      gpu-memory-utilization: 0.95
      swap-space: 4
      max-model-len: 16384

Image Image Image

Expected behavior

Each pod should request 2 GPUs, based on how many available GPUs there are in the preferred nodes

Logs

I was debugging it and noticed the field selector here is not working:

Image

causing this function to fall back to the default number of GPUs required for the model

Environment

  • Kubernetes version (use kubectl version):
  • OS (e.g: cat /etc/os-release):
  • Install tools:
  • Others:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions