-
Notifications
You must be signed in to change notification settings - Fork 123
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I have two preferred nodes with instance type Standard_NC80adis_H100_v5
, which has 2 H100 each (total 4 GPUs). However, when I deployed the following workspace, the two pods generated by the managed StatefulSet are requesting 4 GPUs each:
apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
name: workspace-llama-3-3-70b-instruct
resource:
count: 2
instanceType: "Standard_NC80adis_H100_v5"
labelSelector:
matchLabels:
node.kubernetes.io/instance-type: Standard_NC80adis_H100_v5
inference:
preset:
name: llama-3.3-70b-instruct
presetOptions:
modelAccessSecret: hf-token
config: "llama-inference-params"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: "llama-inference-params"
data:
inference_config.yaml: |
vllm:
cpu-offload-gb: 0
gpu-memory-utilization: 0.95
swap-space: 4
max-model-len: 16384
Expected behavior
Each pod should request 2 GPUs, based on how many available GPUs there are in the preferred nodes
Logs
I was debugging it and noticed the field selector here is not working:
causing this function to fall back to the default number of GPUs required for the model
Environment
- Kubernetes version (use
kubectl version
): - OS (e.g:
cat /etc/os-release
): - Install tools:
- Others:
Additional context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
Done