Skip to content

Conversation

jjjjohnson
Copy link
Contributor

@jjjjohnson jjjjohnson commented Jan 3, 2025

Motivation

After eagel2 target model verify, the accepted tokens might be EOS token, need to check each appended new token. Then free kv cache of the unaccepted tokens.

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@merrymercy
Copy link
Contributor

@yukavio can you help review this?

@merrymercy merrymercy changed the title fix end check fix end check in eagle Jan 4, 2025
@zhyncs zhyncs mentioned this pull request Jan 5, 2025
3 tasks
Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Can you address the comments?

  1. How is the performance slowdown due to this change?
  2. Can you provide a unit test to prevent this from happening in the future?
  3. Can you fix the lint error? https://sgl-project.github.io/references/contribution_guide.html#code-formatting-with-pre-commit

@jjjjohnson jjjjohnson changed the title fix end check in eagle [eagle2] fix end check Jan 7, 2025
@jjjjohnson jjjjohnson changed the title [eagle2] fix end check [eagle2] fix end check when verify Jan 7, 2025
@jjjjohnson jjjjohnson changed the title [eagle2] fix end check when verify [eagle2] fix end check when target model verify Jan 7, 2025
@jjjjohnson jjjjohnson force-pushed the eagle2_fix_end_check branch 3 times, most recently from 9676e35 to 7eae79f Compare January 7, 2025 07:27
@jjjjohnson
Copy link
Contributor Author

Nice catch! Can you address the comments?

  1. How is the performance slowdown due to this change?
  2. Can you provide a unit test to prevent this from happening in the future?
  3. Can you fix the lint error? https://sgl-project.github.io/references/contribution_guide.html#code-formatting-with-pre-commit
  1. There is no noticiable performance difference due to this change(EAGLE-llama2-chat-7B, bs=1, tp=1):
  • before: 199.75 tokens/s
  • after: 203.60 tokens/s
  1. Unit test is provided
  2. Lint error fixed

@merrymercy
Copy link
Contributor

merrymercy commented Jan 8, 2025

The newly added test fails. Please fix it and make sure the test can correctly cover the fixed bug.

@jjjjohnson jjjjohnson force-pushed the eagle2_fix_end_check branch from 7eae79f to 93b698f Compare January 8, 2025 03:36
@jjjjohnson
Copy link
Contributor Author

jjjjohnson commented Jan 8, 2025

The newly added test fails. Please fix it and make sure the test can correctly cover the fixed bug.

The newly added test fixed.

Look like there is no good way the surface the bug before and after the bug since EOS token is not shown in the output text...

  • Before change
image
  • After change
Pasted Graphic 4

@jjjjohnson jjjjohnson force-pushed the eagle2_fix_end_check branch from 93b698f to cb0d7d0 Compare January 8, 2025 04:18
@merrymercy
Copy link
Contributor

The current test is too specific. Can you try to compare against the output of non speculative decoding version?

You can also set this to False to print the EOS

# Whether to skip the special tokens during detokenization
skip_special_tokens: bool = True,

@jjjjohnson jjjjohnson force-pushed the eagle2_fix_end_check branch from cb0d7d0 to dfc0e5e Compare January 8, 2025 04:33
@jjjjohnson
Copy link
Contributor Author

The current test is too specific. Can you try to compare against the output of non speculative decoding version?

You can also set this to False to print the EOS

# Whether to skip the special tokens during detokenization
skip_special_tokens: bool = True,

I added the "skip_special_tokens": False and can see the difference before and after the change
image

@merrymercy
Copy link
Contributor

Great! Then please use this in the test case and we can merge this soon!

@jjjjohnson
Copy link
Contributor Author

Great! Then please use this in the test case and we can merge this soon!

Done

@merrymercy merrymercy merged commit 694e419 into sgl-project:main Jan 8, 2025
15 checks passed
@merrymercy
Copy link
Contributor

@jjjjohnson It is merged. Thanks!

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants