Skip to content

Conversation

FrankLeeeee
Copy link
Collaborator

@FrankLeeeee FrankLeeeee commented Jun 9, 2025

Motivation

When we run benchmark/mtbench/bench_sglang_eagle.py, this will use the /generate API by default, however, it does not work well for models which require chat APIs such as Llama4, as a result, the acceptance length is extremely low for these models.

Thus, I updated this part of code for two purposes:

  1. enable chat api in SGLang frontend
  2. enable speculative decoding stats for chat api

Modifications

The results seem good.

image

Checklist

@FrankLeeeee FrankLeeeee merged commit 22a52b3 into nv_eagle3 Jun 9, 2025
1 check failed
FrankLeeeee added a commit that referenced this pull request Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant