panic: failed to decode batch: could not find a kv cache slot (length: 6656)

### What is the issue?

I am running gemma3:27b on a multi-gpu setup on Linux (Debian, 4 cards).

After the model gets some requests it panics and becomes not responding.

Following params are part of the model launch:

--ctx-size 98304 
--batch-size 512 
--n-gpu-layers 63 
--threads 32 
--flash-attn
--parallel 6

Right now I am on 0.6.4, but the same was happening in 0.6.3 and 0.6.2 as well.

### Relevant log output

```shell
ollama[3482769]: panic: failed to decode batch: could not find a kv cache slot (length: 6656)
ollama[3482769]: goroutine 90 [running]:
ollama[3482769]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0001e4b40, {0x55b8ae2cf380, 0xc0003ff630})
ollama[3482769]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:366 +0x65
ollama[3482769]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
ollama[3482769]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:861 +0xb37
```

### OS

Linux

### GPU

Nvidia

### CPU

_No response_

### Ollama version

0.6.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

panic: failed to decode batch: could not find a kv cache slot (length: 6656) #10127

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

panic: failed to decode batch: could not find a kv cache slot (length: 6656) #10127

Description

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions