-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Feature] Get Token IDs with Engine.generate() #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Get Token IDs with Engine.generate() #2636
Conversation
if self.skip_tokenizer_init: | ||
output_ids.append(req.output_ids) | ||
origin_input_ids.append(req.origin_input_ids) | ||
output_ids.append(req.output_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.skip_tokenizer_init: | |
output_ids.append(req.output_ids) | |
origin_input_ids.append(req.origin_input_ids) | |
output_ids.append(req.output_ids) | |
origin_input_ids.append(req.origin_input_ids) | |
output_ids.append(req.output_ids) |
@@ -657,6 +657,8 @@ async def handle_loop(self): | |||
out_dict = { | |||
"text": recv_obj.output_strs[i], | |||
"meta_info": meta_info, | |||
"input_ids": recv_obj.origin_input_ids[i], | |||
"output_ids": recv_obj.output_ids[i], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default, we should not return this because this introduce extra overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can keep them disabled by default to avoid extra overhead, and just add a startup flag to enable returning these IDs when needed (like in OpenRLHF scenarios).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job. fix current and make a new PR for prompt_token_ids
test/srt/test_engine_token_ids.py
Outdated
1. Whether the input token IDs from the SGLang Engine match those produced | ||
by the Hugging Face tokenizer for the same input string. | ||
2. Whether the output token IDs from the SGLang Engine match those produced | ||
by the Hugging Face tokenizer (excluding the start token, which is | ||
typically added by Hugging Face but not returned by SGLang). | ||
3. Whether the meta information, such as prompt token count and completion | ||
token count, aligns with the actual lengths of input and output token IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State about the start_token
, where we add it, where we not.
test/srt/test_engine_token_ids.py
Outdated
def get_test_engine(): | ||
"""Returns a test engine with the 'Meta-Llama-3.1-8B-Instruct' model.""" | ||
return sgl.Engine( | ||
model_path="meta-llama/Meta-Llama-3.1-8B-Instruct", return_token_ids=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge into test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good flag name. 😂
test/srt/test_engine_token_ids.py
Outdated
|
||
|
||
class TestEngineTokenIds(unittest.TestCase): | ||
"""Tests SGLang's token IDs against Hugging Face tokenizer.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests SGLang's return token IDs against Hugging Face tokenizer.
test/srt/test_engine_token_ids.py
Outdated
def setUp(self): | ||
"""Creates engine, tokenizer, and prompts.""" | ||
self.llm = get_test_engine() | ||
self.tokenizer = AutoTokenizer.from_pretrained( | ||
"meta-llama/Meta-Llama-3.1-8B-Instruct" | ||
) | ||
self.prompts = [ | ||
"Hello, my name is", | ||
"The president of the United States is", | ||
"The capital of France is", | ||
"The future of AI is", | ||
] | ||
self.sampling_params = {"temperature": 0.8, "top_p": 0.95} | ||
|
||
def tearDown(self): | ||
"""Shuts down the engine.""" | ||
self.llm.shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge into one function.
test/srt/test_engine_token_ids.py
Outdated
f"Input token IDs mismatch for: {prompt}", | ||
) | ||
|
||
# Remove start token from HuggingFace as SGLang output doesn't include it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State for the input about start token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain why input has a token but output does not. Make this concise.
test/srt/test_engine_token_ids.py
Outdated
"""Test token ID alignment between SGLang and Hugging Face. | ||
|
||
This test suite ensures that the token IDs generated by the SGLang Engine | ||
are consistent with those of the Hugging Face tokenizer for a given set of | ||
prompts. Specifically, it checks: | ||
1. Whether the input token IDs from the SGLang Engine match those produced | ||
by the Hugging Face tokenizer for the same input string. | ||
2. Whether the output token IDs from the SGLang Engine match those produced | ||
by the Hugging Face tokenizer (excluding the start token, which is | ||
typically added by Hugging Face but not returned by SGLang). | ||
3. Whether the meta information, such as prompt token count and completion | ||
token count, aligns with the actual lengths of input and output token IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this.
|
||
self.assertEqual( | ||
len(output["input_ids"]), | ||
output["meta_info"]["prompt_tokens"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a new PR to change the name, prompt_tokens_length
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next PR, change input_ids
to prompt_tokens
. And current prompt_tokens
into prompt_tokens_length
.
test/srt/test_engine_token_ids.py
Outdated
) | ||
self.assertEqual( | ||
output["meta_info"]["completion_tokens"], | ||
128, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(sgl_output_ids)
python/sglang/srt/server_args.py
Outdated
"--return-token-ids", | ||
action="store_true", | ||
default=ServerArgs.return_token_ids, | ||
help="Whether to return token IDs in the output. Experimental feature.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether to return token IDs in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last change. Saying in server_args.py
that adding this may introduce additional overhead. Others are perfect.
python/sglang/srt/server_args.py
Outdated
help="Whether to return token IDs in the output.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may introduce additional overhead.
)" This reverts commit 35bdb48.
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Motivation
related to #2634
Checklist