Skip to content

The guarantee configuration of the created queue exceeds the available resources of the cluster, causing a schedule error. #4076

@kjingz

Description

@kjingz

Description

The guarantee of the submission queue configuration exceeds the available resources of the cluster, causing volcano schedule failure

Steps to reproduce the issue

1.create queue test-1

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test-1
spec:
  # deserved only take effects when using capacity plugin
  deserved:
    cpu: "10"
    memory: 8Gi
  guarantee:
    resource:
      cpu: 1
      memory: 4Gi
  priority: 100
  reclaimable: true

2.schedule runs normally

3.create queue test-2

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test-2
  namespace: test
spec:
  deserved:
    cpu: "10"
    memory: 8Gi
  guarantee:
    resource:
      cpu: 10000
      memory: 4Gi
  priority: 50
  reclaimable: true

4.schedule pod has Error status

volcano-scheduler-7f666cf77c-gl8mb     0/1     Error     237 (65s ago)   5d6h

5.err logs

I0310 09:20:53.573981       1 event_handlers.go:391] Added pod <wx-test/example-df479d598-n7mqj> into cache.
I0310 09:20:53.573991       1 event_handlers.go:391] Added pod <wx-test/example-pod> into cache.
I0310 09:20:53.574003       1 event_handlers.go:391] Added pod <xx-test/example-78764466f5-6c6cp> into cache.
I0310 09:20:53.574012       1 event_handlers.go:391] Added pod <xx-test/example1-78764466f5-h9mxs> into cache.
I0310 09:20:53.574022       1 event_handlers.go:391] Added pod <xx-test/httpbin-868566b7c-krkqq> into cache.
I0310 09:20:53.574036       1 event_handlers.go:391] Added pod <xx-test/nacos-0> into cache.
I0310 09:20:53.574047       1 event_handlers.go:391] Added pod <xx-test/nacos-1> into cache.
I0310 09:20:53.574062       1 event_handlers.go:391] Added pod <xx-test/nacos-2> into cache.
I0310 09:20:53.574071       1 event_handlers.go:391] Added pod <xx-test/nginx-netshoot-1-55cf9b9948-cgxkd> into cache.
I0310 09:20:53.574097       1 event_handlers.go:391] Added pod <xx-test/sae-app-green-example-556d65f9dd-z9ctv> into cache.
I0310 09:20:53.600875       1 cache.go:792] Start metrics collection, metricsConf is map[]
I0310 09:20:53.600892       1 cache.go:797] The interval for querying metrics data is 30s
I0310 09:20:53.600902       1 scheduler.go:90] Scheduler completes Initialization and start to run
W0310 09:20:53.600993       1 node_info.go:231] the argument node is null.
I0310 09:20:53.601051       1 cache.go:1517] The metrics type is not set in the volcano scheduler configmap file. As a result, the CPU and memory load information of the node is not collected.
W0310 09:20:53.601253       1 node_info.go:231] the argument node is null.
W0310 09:20:53.601607       1 node_info.go:231] the argument node is null.
W0310 09:20:53.602044       1 node_info.go:231] the argument node is null.
W0310 09:20:53.602316       1 node_info.go:231] the argument node is null.
W0310 09:20:53.602686       1 node_info.go:231] the argument node is null.
I0310 09:20:53.604091       1 cache.go:1383] There are <1> Jobs, <3> Queues and <6> Nodes in total for scheduling.
I0310 09:20:53.604121       1 session.go:190] Open Session 96d48683-361f-489a-aa1d-0ec61524873d with <1> Job and <3> Queues
E0310 09:20:53.605657       1 runtime.go:77] Observed a panic: resource is not sufficient to do operation: <cpu 164000.00, memory 750343524352.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 3000.00, pods 660.00, attachable-volumes-csi-csi-clusterfileplugin 12884901882.00, attachable-volumes-csi-csi.tigera.io 12884901882.00, ephemeral-storage 1648769005709000.00> sub <cpu 10001000.00, memory 8589934592.00>
goroutine 525 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1dfaee0, 0xc00255a450})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c39180?})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x1dfaee0?, 0xc00255a450?})
/usr/local/go/src/runtime/panic.go:770 +0x132
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0x9a?, {0xc0009af200?, 0xc0005615b8?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x165
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x230299a?, 0x7f7db6cf7a68?}, {0xc0005615b8?, 0xc0000c4808?, 0xc0025571e0?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x4a
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc002558640, 0xc0025584e0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:260 +0x90
volcano.sh/volcano/pkg/scheduler/plugins/capacity.(*capacityPlugin).OnSessionOpen(0xc002558500, 0xc002192000)
/go/src/volcano.sh/volcano/pkg/scheduler/plugins/capacity/capacity.go:119 +0x8e5
volcano.sh/volcano/pkg/scheduler/framework.OpenSession({0x25e0fb0?, 0xc00020f608?}, {0xc0004b5b90, 0x2, 0x2}, {0x0, 0x0, 0x0})
/go/src/volcano.sh/volcano/pkg/scheduler/framework/framework.go:45 +0x312
volcano.sh/volcano/pkg/scheduler.(*Scheduler).runOnce(0xc000528180)
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:117 +0x29d
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000ffe6e0, {0x25b4340, 0xc002180000}, 0x1, 0xc00022d920)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000ffe6e0, 0x3b9aca00, 0x0, 0x1, 0xc00022d920)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
created by volcano.sh/volcano/pkg/scheduler.(*Scheduler).Run in goroutine 139
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:91 +0x1a7
panic: resource is not sufficient to do operation: <cpu 164000.00, memory 750343524352.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 3000.00, pods 660.00, attachable-volumes-csi-csi-clusterfileplugin 12884901882.00, attachable-volumes-csi-csi.tigera.io 12884901882.00, ephemeral-storage 1648769005709000.00> sub <cpu 10001000.00, memory 8589934592.00> [recovered]
panic: resource is not sufficient to do operation: <cpu 164000.00, memory 750343524352.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 3000.00, pods 660.00, attachable-volumes-csi-csi-clusterfileplugin 12884901882.00, attachable-volumes-csi-csi.tigera.io 12884901882.00, ephemeral-storage 1648769005709000.00> sub <cpu 10001000.00, memory 8589934592.00>

goroutine 525 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c39180?})
	/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x1dfaee0?, 0xc00255a450?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0x9a?, {0xc0009af200?, 0xc0005615b8?})
	/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x165
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x230299a?, 0x7f7db6cf7a68?}, {0xc0005615b8?, 0xc0000c4808?, 0xc0025571e0?})
	/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x4a
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc002558640, 0xc0025584e0)
	/go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:260 +0x90
volcano.sh/volcano/pkg/scheduler/plugins/capacity.(*capacityPlugin).OnSessionOpen(0xc002558500, 0xc002192000)
	/go/src/volcano.sh/volcano/pkg/scheduler/plugins/capacity/capacity.go:119 +0x8e5
volcano.sh/volcano/pkg/scheduler/framework.OpenSession({0x25e0fb0?, 0xc00020f608?}, {0xc0004b5b90, 0x2, 0x2}, {0x0, 0x0, 0x0})
	/go/src/volcano.sh/volcano/pkg/scheduler/framework/framework.go:45 +0x312
volcano.sh/volcano/pkg/scheduler.(*Scheduler).runOnce(0xc000528180)
	/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:117 +0x29d
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000ffe6e0, {0x25b4340, 0xc002180000}, 0x1, 0xc00022d920)
	/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000ffe6e0, 0x3b9aca00, 0x0, 0x1, 0xc00022d920)
	/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
created by volcano.sh/volcano/pkg/scheduler.(*Scheduler).Run in goroutine 139
	/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:91 +0x1a7

Describe the results you received and expected

null

What version of Volcano are you using?

master

Any other relevant information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions