Misc. bug: Qwen3 30B A3B Q4_K_M loads on server but quickly dies after requesting inference through Llama.cpp web UI

### Name and Version

Version (release): B5215 Windows Vulkan x64

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell
echo Running Qwen3 30B MoE server 12 layers 12288 context

llama-server.exe ^
--model "D:\LLMs\Qwen3-30B-A3B-Q4_K_M.gguf" ^
--gpu-layers 12 ^
--ctx-size 12288 ^
--samplers top_k;dry;min_p;temperature;typ_p;xtc ^
--top-k 40 ^
--dry-multiplier 0.5 ^
--min-p 0.00 ^
--temp 0.6 ^
--top-p 0.95 ^
--repeat-penalty 1.1
```

### Problem description & steps to reproduce

Edit: GGUF was downloaded from ggml's HF repository
(https://huggingface.co/ggml-org/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-Q4_K_M.gguf)

It loads and seems everything is ok but as soon as I request inference through Llama.cpp's web UI, I get this error 

![Image](https://github.com/user-attachments/assets/90a12135-d16b-455e-9541-5dcbc2a38ec7)

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Qwen3 30B A3B Q4_K_M loads on server but quickly dies after requesting inference through Llama.cpp web UI #13164

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Qwen3 30B A3B Q4_K_M loads on server but quickly dies after requesting inference through Llama.cpp web UI #13164

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions