[Feature] Support EAGLE 3 #4247

chromecast56 · 2025-03-10T02:33:30Z

Motivation

Add support for EAGLE-3: https://arxiv.org/abs/2503.01840

Modifications

Add EAGLE3 speculative method to server args
Refactor llama.py, logits_processor.py to support capturing auxiliary hidden states
Modify eagle_worker.py to support EAGLE-3 token map + untied LM head
Add llama_eagle3.py model
Tests and Documentation

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

zhyncs · 2025-03-10T02:43:26Z

cc @Liyuhui-12 @hongyanz

chromecast56 · 2025-03-10T02:55:36Z

Benchmarks on MT-Bench (bsz 1):

Autoregressive:


python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --port 30000 --cuda-graph-max-bs 1

#questions: 1, Throughput: 147.05 token/s, Acceptance length: 1.00

EAGLE-3:

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --speculative-algo EAGLE3 \
    --speculative-draft jamesliu1/sglang-EAGLE3-Llama-3.1-Instruct-8B --speculative-num-steps 5 \
    --speculative-eagle-topk 8 --speculative-num-draft-tokens 64 \
    --cuda-graph-max-bs 1 --mem-fraction 0.7 --dtype float16 --port 30000

#questions: 80, Throughput: 336.61 token/s, Acceptance length: 4.29

…o eagle3

zhyncs · 2025-03-10T06:50:52Z

@chromecast56 Can we use --speculative-num-steps 8

chromecast56 · 2025-03-11T18:45:17Z

Acceptance length is calculated as num_output_tokens / num_verify_ct over the entire dataset so it should take it into account. @hongyanz Do you normalize acceptance length per prompt, or is it over the entire dataset?

hongyanz · 2025-03-12T00:43:10Z

@chromecast56 Ours is in the round level. We calculate the number of accepted tokens in each speculative decoding round, add it by 1 (because the very last token in each round will always be accepted as it is from the target model), and average the numbers across all rounds.

ispobock

LGTM

zhaochenyang20 · 2025-03-13T09:12:00Z

@chromecast56 could you fix the conflicts of the docs? @zhyncs @merrymercy could we merge it?

zhyncs · 2025-03-13T21:31:44Z

@merrymercy @Ying1123 reminder

zhyncs · 2025-03-17T17:20:54Z

Hi @chromecast56 May you help fix the conflicts

merrymercy

LGTM! Thanks for the great work.

zhaochenyang20 · 2025-03-18T18:12:00Z

cc @simveit hey simon. Eagle 3 is merged into sglang now, yienng @zhyncs will profiling it today. Could you help to update the docs https://docs.sglang.ai/backend/speculative_decoding.html after yineng provides the performance? thanks so much!

simveit · 2025-03-18T20:04:23Z

@zhaochenyang20 Yes. Let me read the paper in the next days.

finger92 · 2025-03-20T02:49:44Z

should we change
"You can enable EAGLE-3 decoding by setting --speculative_draft_model_path: EAGLE3:"
to
"You can enable EAGLE-3 decoding by setting --speculative-algorithm EAGLE3:"
?

merrymercy · 2025-03-23T01:17:46Z

@ispobock can you share the exact commands to launch the servers for eagle2 and eagle3?
We do not need to benchmark on MT-bench, we can just use python3 -m sglang.test.send_one

ispobock · 2025-03-23T03:38:08Z

@merrymercy For bs=1, the launch command is here:

# EAGLE2
python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --speculative-algo EAGLE \
    --speculative-draft jamesliu1/sglang-EAGLE-Llama-3.1-Instruct-8B --speculative-num-steps 5 \
    --speculative-eagle-topk 8 --speculative-num-draft-tokens 64 \
    --cuda-graph-max-bs 1 --dtype float16 --port 30000 --tp 1 --disable-radix --mem-frac 0.7

# EAGLE3
python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --speculative-algo EAGLE3 \
    --speculative-draft jamesliu1/sglang-EAGLE3-Llama-3.1-Instruct-8B --speculative-num-steps 8 \
    --speculative-eagle-topk 8 --speculative-num-draft-tokens 64 \
    --cuda-graph-max-bs 1 --dtype float16 --port 30000 --tp 1 --disable-radix --mem-frac 0.7

MT-bench is the default bench dataset in EAGLE's evaluation code. It's used to keep align with the setting in the paper.
python3 -m sglang.test.send_one also works for benchmark.

ryang-max · 2025-03-23T05:23:18Z

I'll also work on the docs about this feature these days.
cc @zhaochenyang20 @simveit

james added 5 commits March 9, 2025 18:19

commit

3b74ac7

bugfix

e651e76

bugfix

4aaa517

final bugfix

d77cc66

linting

074b82d

chromecast56 requested review from merrymercy, Ying1123, zhyncs, rkooo567, kssteven418, hnyls2002, ispobock, ByronHsu, HaiShaw and zhaochenyang20 as code owners March 10, 2025 02:33

Merge branch 'main' into eagle3

d963017

james and others added 7 commits March 9, 2025 20:13

make aux optional

ce97ad8

make aux optional

a91afe2

Merge branch 'eagle3' of https://github.com/chromecast56/sglang-1 int…

505fe2d

…o eagle3

Merge branch 'main' into eagle3

12cc99d

nit

d06f945

Merge branch 'eagle3' of https://github.com/chromecast56/sglang-1 int…

c06624b

…o eagle3

nvm

2bfcfb0

zhyncs assigned Ying1123, merrymercy and ispobock Mar 10, 2025

zhyncs added the high priority label Mar 10, 2025

chromecast56 and others added 2 commits March 11, 2025 14:54

Merge branch 'main' into eagle3

2962fba

Merge branch 'main' into eagle3

2cf056d

zhyncs and others added 2 commits March 11, 2025 18:24

Merge branch 'main' into eagle3

8fc89ca

Merge branch 'main' into eagle3

f0d174a

ispobock approved these changes Mar 12, 2025

View reviewed changes

james and others added 2 commits March 13, 2025 11:41

Merge remote-tracking branch 'upstream/main' into eagle3

5edbb47

Merge branch 'main' into eagle3

864d40d

Merge branch 'main' into eagle3

c05b564

Merge branch 'main' into eagle3

f9a2814

merrymercy approved these changes Mar 18, 2025

View reviewed changes

zhyncs added 2 commits March 18, 2025 00:35

Merge branch 'main' into eagle3

3492637

Merge branch 'main' into eagle3

f392f30

zhyncs merged commit 9e0186f into sgl-project:main Mar 18, 2025
34 of 36 checks passed

simveit mentioned this pull request Mar 19, 2025

[Docs] EAGLE3 docs #4580

Closed

This was referenced Mar 19, 2025

Can't wait to try out EAGLE-3 SafeAILab/EAGLE#185

Closed

EAGLE-3 hemingkx/Spec-Bench#22

Closed

zhyncs mentioned this pull request Mar 22, 2025

bump v0.4.4.post2 #4669

Merged

6 tasks

weedge mentioned this pull request May 13, 2025

feat: add VITA-Audio ai-bot-pro/achatbot#146

Merged

[Feature] Support EAGLE 3 #4247

[Feature] Support EAGLE 3 #4247

Uh oh!

Conversation

chromecast56 commented Mar 10, 2025

Motivation

Modifications

Checklist

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

chromecast56 commented Mar 10, 2025

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

chromecast56 commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hongyanz commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ispobock left a comment

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Mar 13, 2025

Uh oh!

zhyncs commented Mar 13, 2025

Uh oh!

zhyncs commented Mar 17, 2025

Uh oh!

merrymercy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhaochenyang20 commented Mar 18, 2025

Uh oh!

simveit commented Mar 18, 2025

Uh oh!

finger92 commented Mar 20, 2025

Uh oh!

merrymercy commented Mar 23, 2025

Uh oh!

ispobock commented Mar 23, 2025

Uh oh!

ryang-max commented Mar 23, 2025

Uh oh!

Uh oh!

chromecast56 commented Mar 11, 2025 •

edited

Loading

hongyanz commented Mar 12, 2025 •

edited

Loading