SDPA support for small batch (over sequence) queries #1922

awni · 2025-03-04T16:05:33Z

Benchmarks on M4 max with the following config:

L = 16384
Lq = 4
H = 32
H_k = H // 4
D = 128
V = 128

Timing sdpa ... 4.70581 msec
Timing attention ... 16.99894 msec

Also updated RoPE to route to a faster path for shapes like [1, H, L, D].

The intention here is mostly to speed up spec dec. Will share some benchmarks here: ml-explore/mlx-examples#1319

angeloskath

Awesome!

The choice for query_transposed vs passing strides and so on was for performance reasons?

awni · 2025-03-04T18:57:33Z

The choice for query_transposed vs passing strides and so on was for performance reasons?

Good question.. I didn't try passing the query strides in so I couldn't say if there is much perf difference, probably minor.. the diff seemed simpler with the function constant. On the other hand it could be more general to allow arbitrary strides in the sequence and head dimension.

awni added 2 commits March 4, 2025 08:06

batch query sdpa

f39dfa5

batch sdpa for query

6aeb67b

awni force-pushed the batch_query_sdpa branch from bbb1e77 to 6aeb67b Compare March 4, 2025 16:06

awni requested review from angeloskath and barronalex March 4, 2025 16:15

awni mentioned this pull request Mar 4, 2025

Use a bool mask for attention ml-explore/mlx-examples#1319

Merged

angeloskath approved these changes Mar 4, 2025

View reviewed changes

awni merged commit e613d0e into main Mar 4, 2025
5 checks passed

awni deleted the batch_query_sdpa branch March 4, 2025 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SDPA support for small batch (over sequence) queries #1922

SDPA support for small batch (over sequence) queries #1922

Uh oh!

awni commented Mar 4, 2025 •

edited

Loading

Uh oh!

angeloskath left a comment

Uh oh!

awni commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

SDPA support for small batch (over sequence) queries #1922

SDPA support for small batch (over sequence) queries #1922

Uh oh!

Conversation

awni commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

awni commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

awni commented Mar 4, 2025 •

edited

Loading