Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 #9093

hebiao064 · 2025-08-12T04:31:28Z

Motivation

Support Triton FP8 Gemm can handle hidden_dim not divisible by 16

Without this change, we can only server GLM Air with TP 4, not TP 8

Modifications

Accuracy Tests

python -m sglang.launch_server --model-path zai-org/GLM-4.5-Air-FP8/ --tp 8


python3 bench_sglang.py --num-questions 200


Accuracy: 0.930
Invalid: 0.000
Latency: 13.808 s
Output throughput: 1456.212 token/s

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-08-12T04:31:30Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

BBuf · 2025-08-12T04:40:00Z

Should we need a unit-test for the kernel?

hebiao064 · 2025-08-12T04:40:04Z

@BBuf @zhyncs fyi

hebiao064 · 2025-08-12T05:55:15Z

Should we need a unit-test for the kernel?

added

…l-project#9093) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

Support Triton FP8 Gemm can handle hidden_dim not divisible by 16

952e8e5

hebiao064 requested review from merrymercy, Ying1123, zhyncs, ispobock, HaiShaw, ch-wan, BBuf, kushanam and Edwardf0t1 as code owners August 12, 2025 04:31

fix

ec1b09a

hebiao064 mentioned this pull request Aug 12, 2025

fix per token cuda kernel hidden dim cannot divide by 16 #8543

Merged

6 tasks

add test

ec487ce

fix format

ec3de06

BBuf approved these changes Aug 13, 2025

View reviewed changes

Merge branch 'main' into support_triton_gemm

2c68fc3

BBuf enabled auto-merge (squash) August 13, 2025 03:18

BBuf disabled auto-merge August 13, 2025 03:18

hebiao064 self-assigned this Aug 13, 2025

hebiao064 added quant LLM Quantization ready-to-merge The PR is ready to merge after the CI is green. labels Aug 13, 2025

hebiao064 merged commit 930fe46 into main Aug 13, 2025
52 of 67 checks passed

hebiao064 deleted the support_triton_gemm branch August 13, 2025 04:21

FlamingoPg mentioned this pull request Aug 13, 2025

Use Tensor Core Decode when gqa group size >= 4 #8624

Merged

6 tasks

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 (sg…

7036bdd

…l-project#9093) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 #9093

Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 #9093

Uh oh!

hebiao064 commented Aug 12, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Aug 12, 2025

Uh oh!

BBuf commented Aug 12, 2025

Uh oh!

hebiao064 commented Aug 12, 2025

Uh oh!

hebiao064 commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 #9093

Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 #9093

Uh oh!

Conversation

hebiao064 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Aug 12, 2025

Uh oh!

BBuf commented Aug 12, 2025

Uh oh!

hebiao064 commented Aug 12, 2025

Uh oh!

hebiao064 commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

hebiao064 commented Aug 12, 2025 •

edited

Loading