Support Eagle2 for Triton backend #3466

ispobock · 2025-02-10T09:24:15Z

Motivation

Support Eagle2 for Triton backend and achieve 2.6x speedup on batch size 1 for no cuda graph setting. (Based on #3317, #3309, #3292)
CUDA graph will be supported in follow-up PR.

python3 -m sglang.launch_server --model meta-llama/Llama-2-7b-chat-hf --disable-radix --disable-cuda-graph --attention-backend triton

speed: 71.34 token/s

python3 -m sglang.launch_server --model meta-llama/Llama-2-7b-chat-hf  --speculative-algo EAGLE --speculative-draft lmzheng/sglang-EAGLE-llama2-chat-7B --speculative-num-steps 5 --speculative-eagle-topk 8 --speculative-num-draft-tokens 64 --mem-fraction 0.8 --disable-radix --disable-cuda-graph --attention-backend triton

speed: 185.54 token/s

Test script is referenced from #2150.

Thanks @aspctu for helping debug!

ispobock added 9 commits February 9, 2025 11:25

rename

d05a4a9

tmp

b0a81d0

update

0965a96

fix mask indptr

167a713

update

b046382

fix target verify

efffc31

fix illegal mem & cleanup

a95d09a

fix draft extend

4f339d3

add test

347beb7

ispobock requested review from merrymercy, Ying1123, zhyncs, hnyls2002 and ByronHsu as code owners February 10, 2025 09:24

Merge branch 'main' into eagle-triton-dbg

763316e

zhyncs approved these changes Feb 10, 2025

View reviewed changes

zhyncs merged commit 2d61132 into sgl-project:main Feb 10, 2025
18 of 19 checks passed

zhyncs mentioned this pull request Feb 10, 2025

[Track] DeepSeek V3/R1 nextn progress #3472

Closed

13 tasks

feifeibear mentioned this pull request Feb 11, 2025

[Bug] Qwen2 Eagle serving error #3465

Closed

5 tasks

This was referenced Feb 11, 2025

Support Eagle cuda graph for Triton backend #3500

Merged

Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 #3582

Merged

ispobock mentioned this pull request Feb 19, 2025

[Feature] support EAGLE 2 with Triton Backend #2940

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Eagle2 for Triton backend #3466

Support Eagle2 for Triton backend #3466

Uh oh!

ispobock commented Feb 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Support Eagle2 for Triton backend #3466

Support Eagle2 for Triton backend #3466

Uh oh!

Conversation

ispobock commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

Uh oh!

Uh oh!

ispobock commented Feb 10, 2025 •

edited

Loading