-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What is the issue?
I am running gemma3:27b on a multi-gpu setup on Linux (Debian, 4 cards).
After the model gets some requests it panics and becomes not responding.
Following params are part of the model launch:
--ctx-size 98304
--batch-size 512
--n-gpu-layers 63
--threads 32
--flash-attn
--parallel 6
Right now I am on 0.6.4, but the same was happening in 0.6.3 and 0.6.2 as well.
Relevant log output
ollama[3482769]: panic: failed to decode batch: could not find a kv cache slot (length: 6656)
ollama[3482769]: goroutine 90 [running]:
ollama[3482769]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0001e4b40, {0x55b8ae2cf380, 0xc0003ff630})
ollama[3482769]: github.com/ollama/ollama/runner/ollamarunner/runner.go:366 +0x65
ollama[3482769]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
ollama[3482769]: github.com/ollama/ollama/runner/ollamarunner/runner.go:861 +0xb37
OS
Linux
GPU
Nvidia
CPU
No response
Ollama version
0.6.4
darexsu, nicolay-i, ivanbaldo, bbilly1, ayorgo and 2 more
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working