Skip to content

Conversation

shuaills
Copy link
Collaborator

Motivation

related to #2634

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Comment on lines 1261 to 1264
if self.skip_tokenizer_init:
output_ids.append(req.output_ids)
origin_input_ids.append(req.origin_input_ids)
output_ids.append(req.output_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.skip_tokenizer_init:
output_ids.append(req.output_ids)
origin_input_ids.append(req.origin_input_ids)
output_ids.append(req.output_ids)
origin_input_ids.append(req.origin_input_ids)
output_ids.append(req.output_ids)

@@ -657,6 +657,8 @@ async def handle_loop(self):
out_dict = {
"text": recv_obj.output_strs[i],
"meta_info": meta_info,
"input_ids": recv_obj.origin_input_ids[i],
"output_ids": recv_obj.output_ids[i],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default, we should not return this because this introduce extra overhead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can keep them disabled by default to avoid extra overhead, and just add a startup flag to enable returning these IDs when needed (like in OpenRLHF scenarios).

Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job. fix current and make a new PR for prompt_token_ids

Comment on lines 6 to 12
1. Whether the input token IDs from the SGLang Engine match those produced
by the Hugging Face tokenizer for the same input string.
2. Whether the output token IDs from the SGLang Engine match those produced
by the Hugging Face tokenizer (excluding the start token, which is
typically added by Hugging Face but not returned by SGLang).
3. Whether the meta information, such as prompt token count and completion
token count, aligns with the actual lengths of input and output token IDs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State about the start_token, where we add it, where we not.

Comment on lines 25 to 29
def get_test_engine():
"""Returns a test engine with the 'Meta-Llama-3.1-8B-Instruct' model."""
return sgl.Engine(
model_path="meta-llama/Meta-Llama-3.1-8B-Instruct", return_token_ids=True
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge into test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good flag name. 😂



class TestEngineTokenIds(unittest.TestCase):
"""Tests SGLang's token IDs against Hugging Face tokenizer."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests SGLang's return token IDs against Hugging Face tokenizer.

Comment on lines 35 to 51
def setUp(self):
"""Creates engine, tokenizer, and prompts."""
self.llm = get_test_engine()
self.tokenizer = AutoTokenizer.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B-Instruct"
)
self.prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
self.sampling_params = {"temperature": 0.8, "top_p": 0.95}

def tearDown(self):
"""Shuts down the engine."""
self.llm.shutdown()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge into one function.

f"Input token IDs mismatch for: {prompt}",
)

# Remove start token from HuggingFace as SGLang output doesn't include it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State for the input about start token.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why input has a token but output does not. Make this concise.

Comment on lines 1 to 12
"""Test token ID alignment between SGLang and Hugging Face.

This test suite ensures that the token IDs generated by the SGLang Engine
are consistent with those of the Hugging Face tokenizer for a given set of
prompts. Specifically, it checks:
1. Whether the input token IDs from the SGLang Engine match those produced
by the Hugging Face tokenizer for the same input string.
2. Whether the output token IDs from the SGLang Engine match those produced
by the Hugging Face tokenizer (excluding the start token, which is
typically added by Hugging Face but not returned by SGLang).
3. Whether the meta information, such as prompt token count and completion
token count, aligns with the actual lengths of input and output token IDs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this.


self.assertEqual(
len(output["input_ids"]),
output["meta_info"]["prompt_tokens"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a new PR to change the name, prompt_tokens_length.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR, change input_ids to prompt_tokens. And current prompt_tokens into prompt_tokens_length.

)
self.assertEqual(
output["meta_info"]["completion_tokens"],
128,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(sgl_output_ids)

"--return-token-ids",
action="store_true",
default=ServerArgs.return_token_ids,
help="Whether to return token IDs in the output. Experimental feature.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether to return token IDs in the output.

Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change. Saying in server_args.py that adding this may introduce additional overhead. Others are perfect.

Comment on lines 288 to 289
help="Whether to return token IDs in the output.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may introduce additional overhead.

@zhaochenyang20 zhaochenyang20 merged commit 35bdb48 into sgl-project:main Dec 29, 2024
1 of 14 checks passed
@shuaills shuaills deleted the feature/#2634/tokenID branch December 30, 2024 16:47
shuaills added a commit to shuaills/sglang that referenced this pull request Jan 11, 2025
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants