NEON implementation for trellis quants #471

ikawrakow · 2025-05-29T15:57:06Z

Alternative to #460

One wouldn't really want to use this on a NEON CPU as it is much too slow. But for the sake of completeness, here it is.

Sweep bench results for LLaMA-3.1-8B-Instruct with BLAS on M2-Max CPU (PP performance is much lower without BLAS)

IQ2_KT

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	5.364	95.44	11.527	11.10
512	128	512	4.644	110.25	11.739	10.90
512	128	1024	4.870	105.14	12.270	10.43
512	128	1536	5.055	101.29	12.644	10.12
512	128	2048	5.289	96.81	12.732	10.05

IQ3_KT

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	7.470	68.54	16.866	7.59
512	128	512	6.764	75.70	16.985	7.54
512	128	1024	6.987	73.28	17.157	7.46
512	128	1536	7.180	71.31	17.459	7.33
512	128	2048	7.401	69.18	17.453	7.33

IQ4_KT

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
512	128	0	5.443	94.07	22.327	5.73
512	128	512	4.658	109.91	22.432	5.71
512	128	1024	4.889	104.73	22.937	5.58
512	128	1536	5.069	101.01	22.843	5.60
512	128	2048	5.295	96.70	22.816	5.61

This is nevertheless quite a bit faster than #460, so I'll go with this PR.

Of note: I couldn't make IQ4_KT work with fp16 arithmetic for some reason. Not sure if there really is fp16 range overflow, or if I just have a bug in the fp16 implementation that I simply cannot see.

Have to use f32 arithmetic else I get gibberish? Correspondigly ridiculously slow.

Iwan Kawrakow added 6 commits May 24, 2025 18:25

iq2_kt: NEON implementation

5e684c1

iq3_kt: NEON implementation

e0fedae

iq4_kt: not working NEON implementation

465fe3b

iq4_kt: NEON implementation

cc395cf

Have to use f32 arithmetic else I get gibberish? Correspondigly ridiculously slow.

Cleanup

7a783af

iq4_kt: slightly faster TG on NEON

17dcd4d

ikawrakow merged commit 1eac9e8 into main May 29, 2025

ikawrakow mentioned this pull request Jun 1, 2025

Trellis quantization #113

Closed

ikawrakow mentioned this pull request Jun 20, 2025

New integer trellis on ARM_NEON #544

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NEON implementation for trellis quants #471

NEON implementation for trellis quants #471

Uh oh!

ikawrakow commented May 29, 2025

Uh oh!

Uh oh!

NEON implementation for trellis quants #471

NEON implementation for trellis quants #471

Uh oh!

Conversation

ikawrakow commented May 29, 2025

IQ2_KT

IQ3_KT

IQ4_KT

Uh oh!

Uh oh!