Custom logsumexp #2028

awni · 2025-03-30T23:32:39Z

Add this to speedup and reduce memory use when computing logsumexp during e.g. lora fine tuning. The large vocab (200k+ in some cases) makes this optimization important. Also since the logsumexp is in high precision this let's us avoid up casting the logits and instead only upcasting the per-token loss prior to the reduction.

QLoRA fine tuning Gemma 3 1B:

Pre: Iter 10: Train loss 3.595, Learning Rate 1.000e-05, It/sec 0.227, Tokens/sec 812.842, Trained Tokens 35845, Peak mem 66.335 GB

Post: Iter 10: Train loss 3.594, Learning Rate 1.000e-05, It/sec 0.345, Tokens/sec 1235.568, Trained Tokens 35845, Peak mem 49.164 GB

And a microbenchmark:

shape = (4096, 120_000)

x = mx.random.uniform(shape=shape)

def fun(x):
    return [mx.logsumexp(x, axis=-1, keepdims=True) for _ in range(100)]

Pre: 1588.345
Post: 257.241

angeloskath

Looks great! The memory savings look even greater :-)

Left one nitpick...

angeloskath · 2025-03-31T01:14:14Z

mlx/backend/metal/kernels/logsumexp.metal

+#include "mlx/backend/metal/kernels/utils.h"
+#include "mlx/backend/metal/kernels/logsumexp.h"
+
+#define instantiate_logsumexp(name, itype)                          \


Sorry I think I forgot to comment it before (it's a nitpick anyway)...

Maybe use instantiate_kernel here?

zcbenz · 2025-04-10T01:31:12Z

python/tests/test_ops.py

+        # Large
+        x = mx.random.uniform(shape=(1025,))
+        x = mx.broadcast_to(mx.random.uniform(shape=(2, 1, 8)), (2, 2, 8))
+        self.assertTrue(mx.allclose(mx.logsumexp(x), logsumexp(x)))


This assertion tests the same thing with previous one.

Yea looks like a mistake

initial custom logsumexp

66527b1

awni force-pushed the custom_logsumexp branch from e292e8b to bc91df8 Compare March 30, 2025 23:35

more tests

aadc758

awni force-pushed the custom_logsumexp branch from bc91df8 to aadc758 Compare March 30, 2025 23:59

angeloskath approved these changes Mar 31, 2025

View reviewed changes

angeloskath reviewed Mar 31, 2025

View reviewed changes

comments + fix

918126b

awni merged commit de5f38f into main Mar 31, 2025
4 checks passed

awni deleted the custom_logsumexp branch March 31, 2025 14:36

zcbenz reviewed Apr 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom logsumexp #2028

Custom logsumexp #2028

Uh oh!

awni commented Mar 30, 2025 •

edited

Loading

Uh oh!

angeloskath left a comment

Uh oh!

angeloskath Mar 31, 2025

Uh oh!

Uh oh!

zcbenz Apr 10, 2025

Uh oh!

awni Apr 10, 2025

Uh oh!

Uh oh!

Custom logsumexp #2028

Custom logsumexp #2028

Uh oh!

Conversation

awni commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

angeloskath Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zcbenz Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

awni Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

awni commented Mar 30, 2025 •

edited

Loading