Skip to content

Conversation

mxyng
Copy link
Contributor

@mxyng mxyng commented Aug 20, 2025

this change enables flash attention by default for gpt-oss

@jessegross
Copy link
Contributor

I think the PR title is misleading, as it doesn't change anything about KV cache quantization.

@mxyng mxyng changed the title gptoss: enable flash attention, disable kv cache quantization gptoss: enable flash attention Aug 20, 2025
@mxyng
Copy link
Contributor Author

mxyng commented Aug 20, 2025

updated pr title and commit message. it's an artifact of the original commit as are some of the changes

@mxyng mxyng changed the title gptoss: enable flash attention gptoss: enable flash attention by default Aug 20, 2025
@mxyng mxyng force-pushed the mxyng/gpt-oss branch 2 times, most recently from 3272208 to 7c80562 Compare August 20, 2025 22:33
jessegross
jessegross previously approved these changes Aug 20, 2025
@jessegross
Copy link
Contributor

Actually, the estimate for gpt-oss needs to be conditional on flash attention if it might be disabled on some hardware.

@jessegross jessegross dismissed their stale review August 20, 2025 22:56

Actually, the estimate for gpt-oss needs to be conditional on flash attention if it might be disabled on some hardware.

@mxyng
Copy link
Contributor Author

mxyng commented Aug 22, 2025

there isn't currently a mechanism to have different estimates for flash attention and I'm reluctant to add one give new estimates so going to revert that portion of the changes

@mxyng mxyng merged commit 85ccf73 into main Aug 26, 2025
8 checks passed
@mxyng mxyng deleted the mxyng/gpt-oss branch August 26, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants