-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
Description
The guarantee of the submission queue configuration exceeds the available resources of the cluster, causing volcano schedule failure
Steps to reproduce the issue
1.create queue test-1
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test-1
spec:
# deserved only take effects when using capacity plugin
deserved:
cpu: "10"
memory: 8Gi
guarantee:
resource:
cpu: 1
memory: 4Gi
priority: 100
reclaimable: true
2.schedule runs normally
3.create queue test-2
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test-2
namespace: test
spec:
deserved:
cpu: "10"
memory: 8Gi
guarantee:
resource:
cpu: 10000
memory: 4Gi
priority: 50
reclaimable: true
4.schedule pod has Error status
volcano-scheduler-7f666cf77c-gl8mb 0/1 Error 237 (65s ago) 5d6h
5.err logs
I0310 09:20:53.573981 1 event_handlers.go:391] Added pod <wx-test/example-df479d598-n7mqj> into cache.
I0310 09:20:53.573991 1 event_handlers.go:391] Added pod <wx-test/example-pod> into cache.
I0310 09:20:53.574003 1 event_handlers.go:391] Added pod <xx-test/example-78764466f5-6c6cp> into cache.
I0310 09:20:53.574012 1 event_handlers.go:391] Added pod <xx-test/example1-78764466f5-h9mxs> into cache.
I0310 09:20:53.574022 1 event_handlers.go:391] Added pod <xx-test/httpbin-868566b7c-krkqq> into cache.
I0310 09:20:53.574036 1 event_handlers.go:391] Added pod <xx-test/nacos-0> into cache.
I0310 09:20:53.574047 1 event_handlers.go:391] Added pod <xx-test/nacos-1> into cache.
I0310 09:20:53.574062 1 event_handlers.go:391] Added pod <xx-test/nacos-2> into cache.
I0310 09:20:53.574071 1 event_handlers.go:391] Added pod <xx-test/nginx-netshoot-1-55cf9b9948-cgxkd> into cache.
I0310 09:20:53.574097 1 event_handlers.go:391] Added pod <xx-test/sae-app-green-example-556d65f9dd-z9ctv> into cache.
I0310 09:20:53.600875 1 cache.go:792] Start metrics collection, metricsConf is map[]
I0310 09:20:53.600892 1 cache.go:797] The interval for querying metrics data is 30s
I0310 09:20:53.600902 1 scheduler.go:90] Scheduler completes Initialization and start to run
W0310 09:20:53.600993 1 node_info.go:231] the argument node is null.
I0310 09:20:53.601051 1 cache.go:1517] The metrics type is not set in the volcano scheduler configmap file. As a result, the CPU and memory load information of the node is not collected.
W0310 09:20:53.601253 1 node_info.go:231] the argument node is null.
W0310 09:20:53.601607 1 node_info.go:231] the argument node is null.
W0310 09:20:53.602044 1 node_info.go:231] the argument node is null.
W0310 09:20:53.602316 1 node_info.go:231] the argument node is null.
W0310 09:20:53.602686 1 node_info.go:231] the argument node is null.
I0310 09:20:53.604091 1 cache.go:1383] There are <1> Jobs, <3> Queues and <6> Nodes in total for scheduling.
I0310 09:20:53.604121 1 session.go:190] Open Session 96d48683-361f-489a-aa1d-0ec61524873d with <1> Job and <3> Queues
E0310 09:20:53.605657 1 runtime.go:77] Observed a panic: resource is not sufficient to do operation: <cpu 164000.00, memory 750343524352.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 3000.00, pods 660.00, attachable-volumes-csi-csi-clusterfileplugin 12884901882.00, attachable-volumes-csi-csi.tigera.io 12884901882.00, ephemeral-storage 1648769005709000.00> sub <cpu 10001000.00, memory 8589934592.00>
goroutine 525 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1dfaee0, 0xc00255a450})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c39180?})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x1dfaee0?, 0xc00255a450?})
/usr/local/go/src/runtime/panic.go:770 +0x132
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0x9a?, {0xc0009af200?, 0xc0005615b8?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x165
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x230299a?, 0x7f7db6cf7a68?}, {0xc0005615b8?, 0xc0000c4808?, 0xc0025571e0?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x4a
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc002558640, 0xc0025584e0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:260 +0x90
volcano.sh/volcano/pkg/scheduler/plugins/capacity.(*capacityPlugin).OnSessionOpen(0xc002558500, 0xc002192000)
/go/src/volcano.sh/volcano/pkg/scheduler/plugins/capacity/capacity.go:119 +0x8e5
volcano.sh/volcano/pkg/scheduler/framework.OpenSession({0x25e0fb0?, 0xc00020f608?}, {0xc0004b5b90, 0x2, 0x2}, {0x0, 0x0, 0x0})
/go/src/volcano.sh/volcano/pkg/scheduler/framework/framework.go:45 +0x312
volcano.sh/volcano/pkg/scheduler.(*Scheduler).runOnce(0xc000528180)
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:117 +0x29d
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000ffe6e0, {0x25b4340, 0xc002180000}, 0x1, 0xc00022d920)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000ffe6e0, 0x3b9aca00, 0x0, 0x1, 0xc00022d920)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
created by volcano.sh/volcano/pkg/scheduler.(*Scheduler).Run in goroutine 139
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:91 +0x1a7
panic: resource is not sufficient to do operation: <cpu 164000.00, memory 750343524352.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 3000.00, pods 660.00, attachable-volumes-csi-csi-clusterfileplugin 12884901882.00, attachable-volumes-csi-csi.tigera.io 12884901882.00, ephemeral-storage 1648769005709000.00> sub <cpu 10001000.00, memory 8589934592.00> [recovered]
panic: resource is not sufficient to do operation: <cpu 164000.00, memory 750343524352.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 3000.00, pods 660.00, attachable-volumes-csi-csi-clusterfileplugin 12884901882.00, attachable-volumes-csi-csi.tigera.io 12884901882.00, ephemeral-storage 1648769005709000.00> sub <cpu 10001000.00, memory 8589934592.00>
goroutine 525 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c39180?})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x1dfaee0?, 0xc00255a450?})
/usr/local/go/src/runtime/panic.go:770 +0x132
volcano.sh/volcano/pkg/scheduler/util/assert.Assert(0x9a?, {0xc0009af200?, 0xc0005615b8?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:33 +0x165
volcano.sh/volcano/pkg/scheduler/util/assert.Assertf(0x0, {0x230299a?, 0x7f7db6cf7a68?}, {0xc0005615b8?, 0xc0000c4808?, 0xc0025571e0?})
/go/src/volcano.sh/volcano/pkg/scheduler/util/assert/assert.go:43 +0x4a
volcano.sh/volcano/pkg/scheduler/api.(*Resource).Sub(0xc002558640, 0xc0025584e0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/resource_info.go:260 +0x90
volcano.sh/volcano/pkg/scheduler/plugins/capacity.(*capacityPlugin).OnSessionOpen(0xc002558500, 0xc002192000)
/go/src/volcano.sh/volcano/pkg/scheduler/plugins/capacity/capacity.go:119 +0x8e5
volcano.sh/volcano/pkg/scheduler/framework.OpenSession({0x25e0fb0?, 0xc00020f608?}, {0xc0004b5b90, 0x2, 0x2}, {0x0, 0x0, 0x0})
/go/src/volcano.sh/volcano/pkg/scheduler/framework/framework.go:45 +0x312
volcano.sh/volcano/pkg/scheduler.(*Scheduler).runOnce(0xc000528180)
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:117 +0x29d
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000ffe6e0, {0x25b4340, 0xc002180000}, 0x1, 0xc00022d920)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000ffe6e0, 0x3b9aca00, 0x0, 0x1, 0xc00022d920)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
created by volcano.sh/volcano/pkg/scheduler.(*Scheduler).Run in goroutine 139
/go/src/volcano.sh/volcano/pkg/scheduler/scheduler.go:91 +0x1a7
Describe the results you received and expected
null
What version of Volcano are you using?
master
Any other relevant information
No response
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.