gptoss: enable flash attention by default #11996

mxyng · 2025-08-20T21:55:38Z

this change enables flash attention by default for gpt-oss

jessegross · 2025-08-20T22:20:33Z

I think the PR title is misleading, as it doesn't change anything about KV cache quantization.

fs/ggml/ggml.go

llm/server.go

mxyng · 2025-08-20T22:24:40Z

updated pr title and commit message. it's an artifact of the original commit as are some of the changes

jessegross · 2025-08-20T22:55:50Z

Actually, the estimate for gpt-oss needs to be conditional on flash attention if it might be disabled on some hardware.

mxyng · 2025-08-22T22:06:56Z

there isn't currently a mechanism to have different estimates for flash attention and I'm reluctant to add one give new estimates so going to revert that portion of the changes

mxyng force-pushed the mxyng/gpt-oss branch from a616def to 071d3f8 Compare August 20, 2025 21:56

jessegross reviewed Aug 20, 2025

View reviewed changes

fs/ggml/ggml.go Outdated Show resolved Hide resolved

llm/server.go Outdated Show resolved Hide resolved

mxyng changed the title ~~gptoss: enable flash attention, disable kv cache quantization~~ gptoss: enable flash attention Aug 20, 2025

mxyng force-pushed the mxyng/gpt-oss branch from 071d3f8 to 681f089 Compare August 20, 2025 22:29

mxyng changed the title ~~gptoss: enable flash attention~~ gptoss: enable flash attention by default Aug 20, 2025

mxyng force-pushed the mxyng/gpt-oss branch 2 times, most recently from 3272208 to 7c80562 Compare August 20, 2025 22:33

jessegross previously approved these changes Aug 20, 2025

View reviewed changes

mxyng force-pushed the mxyng/gpt-oss branch from 7c80562 to 502ac6b Compare August 22, 2025 22:05

gptoss: enable flash attention by default

aeb02f9

mxyng force-pushed the mxyng/gpt-oss branch from 502ac6b to aeb02f9 Compare August 26, 2025 17:19

jessegross approved these changes Aug 26, 2025

View reviewed changes

mxyng merged commit 85ccf73 into main Aug 26, 2025
8 checks passed

mxyng deleted the mxyng/gpt-oss branch August 26, 2025 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gptoss: enable flash attention by default #11996

gptoss: enable flash attention by default #11996

Uh oh!

mxyng commented Aug 20, 2025

Uh oh!

jessegross commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

mxyng commented Aug 20, 2025

Uh oh!

jessegross commented Aug 20, 2025

Uh oh!

mxyng commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

gptoss: enable flash attention by default #11996

gptoss: enable flash attention by default #11996

Uh oh!

Conversation

mxyng commented Aug 20, 2025

Uh oh!

jessegross commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

mxyng commented Aug 20, 2025

Uh oh!

jessegross commented Aug 20, 2025

Uh oh!

mxyng commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!