Faster small batch qmv #1861

awni · 2025-02-13T04:26:57Z

Speeds up small batch qvm and qmv by swapping batch and block dimensions in the kernel:

Speculative generation benchmark on M2 Ultra:

mlx_lm.generate --model mlx-community/Qwen2.5-32B-Instruct-4bit --prompt "Write a quicksort algorithm" --draft-model mlx-community/Qwen2.5-0.5B-Instruct-4bit -m 1000 --temp 0

Pre: Generation: 390 tokens, 31.786 tokens-per-sec
Post: Generation: 390 tokens, 37.843 tokens-per-sec
No draft model: Generation: 390 tokens, 31.765 tokens-per-sec

angeloskath

Let's go! 🚀🚀🚀

barronalex · 2025-02-13T05:43:52Z

Nice!! 🚀

awni · 2025-02-13T06:02:29Z

Helps with ml-explore/mlx-examples#1281

awni added 2 commits February 12, 2025 19:38

faster small batch qmv

bbe24a6

swap batch and block dims for qvm and qmv regular

e2549b7

angeloskath approved these changes Feb 13, 2025

View reviewed changes

awni merged commit e425dc0 into main Feb 13, 2025
5 checks passed

awni deleted the faster_small_batch_qmv branch February 13, 2025 06:02

BrewTestBot mentioned this pull request Feb 14, 2025

mlx 0.23.0 Homebrew/homebrew-core#207747

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster small batch qmv #1861

Faster small batch qmv #1861

Uh oh!

awni commented Feb 13, 2025

Uh oh!

angeloskath left a comment

Uh oh!

barronalex commented Feb 13, 2025

Uh oh!

awni commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

Faster small batch qmv #1861

Faster small batch qmv #1861

Uh oh!

Conversation

awni commented Feb 13, 2025

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

barronalex commented Feb 13, 2025

Uh oh!

awni commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!