Skip to content

panic: failed to decode batch: could not find a kv cache slot (length: 6656) #10127

@rossbg

Description

@rossbg

What is the issue?

I am running gemma3:27b on a multi-gpu setup on Linux (Debian, 4 cards).

After the model gets some requests it panics and becomes not responding.

Following params are part of the model launch:

--ctx-size 98304
--batch-size 512
--n-gpu-layers 63
--threads 32
--flash-attn
--parallel 6

Right now I am on 0.6.4, but the same was happening in 0.6.3 and 0.6.2 as well.

Relevant log output

ollama[3482769]: panic: failed to decode batch: could not find a kv cache slot (length: 6656)
ollama[3482769]: goroutine 90 [running]:
ollama[3482769]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0001e4b40, {0x55b8ae2cf380, 0xc0003ff630})
ollama[3482769]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:366 +0x65
ollama[3482769]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
ollama[3482769]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:861 +0xb37

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.6.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions