[Feature] Get Token IDs with Engine.generate() #2636

shuaills · 2024-12-29T07:23:27Z

Motivation

related to #2634

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

…te()

test/srt/test_engine_token_ids.py

merrymercy · 2024-12-29T11:15:15Z

python/sglang/srt/managers/scheduler.py

                    if self.skip_tokenizer_init:
                        output_ids.append(req.output_ids)
+                    origin_input_ids.append(req.origin_input_ids)
+                    output_ids.append(req.output_ids)


Suggested change

if self.skip_tokenizer_init:

output_ids.append(req.output_ids)

origin_input_ids.append(req.origin_input_ids)

output_ids.append(req.output_ids)

origin_input_ids.append(req.origin_input_ids)

output_ids.append(req.output_ids)

merrymercy · 2024-12-29T11:15:48Z

python/sglang/srt/managers/tokenizer_manager.py

@@ -657,6 +657,8 @@ async def handle_loop(self):
                        out_dict = {
                            "text": recv_obj.output_strs[i],
                            "meta_info": meta_info,
+                            "input_ids": recv_obj.origin_input_ids[i],
+                            "output_ids": recv_obj.output_ids[i],


by default, we should not return this because this introduce extra overhead.

Yes. We can keep them disabled by default to avoid extra overhead, and just add a startup flag to enable returning these IDs when needed (like in OpenRLHF scenarios).

…ed classes

zhaochenyang20

Great job. fix current and make a new PR for prompt_token_ids

zhaochenyang20 · 2024-12-29T18:27:09Z

test/srt/test_engine_token_ids.py

+1. Whether the input token IDs from the SGLang Engine match those produced
+   by the Hugging Face tokenizer for the same input string.
+2. Whether the output token IDs from the SGLang Engine match those produced
+   by the Hugging Face tokenizer (excluding the start token, which is
+   typically added by Hugging Face but not returned by SGLang).
+3. Whether the meta information, such as prompt token count and completion
+   token count, aligns with the actual lengths of input and output token IDs.


State about the start_token, where we add it, where we not.

zhaochenyang20 · 2024-12-29T18:28:21Z

test/srt/test_engine_token_ids.py

+def get_test_engine():
+    """Returns a test engine with the 'Meta-Llama-3.1-8B-Instruct' model."""
+    return sgl.Engine(
+        model_path="meta-llama/Meta-Llama-3.1-8B-Instruct", return_token_ids=True
+    )


Merge into test.

Good flag name. 😂

zhaochenyang20 · 2024-12-29T18:30:13Z

test/srt/test_engine_token_ids.py

+
+
+class TestEngineTokenIds(unittest.TestCase):
+    """Tests SGLang's token IDs against Hugging Face tokenizer."""


Tests SGLang's return token IDs against Hugging Face tokenizer.

zhaochenyang20 · 2024-12-29T18:30:51Z

test/srt/test_engine_token_ids.py

+    def setUp(self):
+        """Creates engine, tokenizer, and prompts."""
+        self.llm = get_test_engine()
+        self.tokenizer = AutoTokenizer.from_pretrained(
+            "meta-llama/Meta-Llama-3.1-8B-Instruct"
+        )
+        self.prompts = [
+            "Hello, my name is",
+            "The president of the United States is",
+            "The capital of France is",
+            "The future of AI is",
+        ]
+        self.sampling_params = {"temperature": 0.8, "top_p": 0.95}
+
+    def tearDown(self):
+        """Shuts down the engine."""
+        self.llm.shutdown()


Merge into one function.

zhaochenyang20 · 2024-12-29T18:32:19Z

test/srt/test_engine_token_ids.py

+                f"Input token IDs mismatch for: {prompt}",
+            )
+
+            # Remove start token from HuggingFace as SGLang output doesn't include it


State for the input about start token.

Explain why input has a token but output does not. Make this concise.

zhaochenyang20 · 2024-12-29T18:33:58Z

test/srt/test_engine_token_ids.py

+"""Test token ID alignment between SGLang and Hugging Face.
+
+This test suite ensures that the token IDs generated by the SGLang Engine
+are consistent with those of the Hugging Face tokenizer for a given set of
+prompts. Specifically, it checks:
+1. Whether the input token IDs from the SGLang Engine match those produced
+   by the Hugging Face tokenizer for the same input string.
+2. Whether the output token IDs from the SGLang Engine match those produced
+   by the Hugging Face tokenizer (excluding the start token, which is
+   typically added by Hugging Face but not returned by SGLang).
+3. Whether the meta information, such as prompt token count and completion
+   token count, aligns with the actual lengths of input and output token IDs.


Delete this.

zhaochenyang20 · 2024-12-29T18:35:49Z

test/srt/test_engine_token_ids.py

+
+            self.assertEqual(
+                len(output["input_ids"]),
+                output["meta_info"]["prompt_tokens"],


Make a new PR to change the name, prompt_tokens_length.

In the next PR, change input_ids to prompt_tokens. And current prompt_tokens into prompt_tokens_length.

zhaochenyang20 · 2024-12-29T18:37:38Z

test/srt/test_engine_token_ids.py

+            )
+            self.assertEqual(
+                output["meta_info"]["completion_tokens"],
+                128,


len(sgl_output_ids)

zhaochenyang20 · 2024-12-29T18:38:18Z

python/sglang/srt/server_args.py

+            "--return-token-ids",
+            action="store_true",
+            default=ServerArgs.return_token_ids,
+            help="Whether to return token IDs in the output. Experimental feature.",


Whether to return token IDs in the output.

zhaochenyang20

One last change. Saying in server_args.py that adding this may introduce additional overhead. Others are perfect.

zhaochenyang20 · 2024-12-29T20:23:08Z

python/sglang/srt/server_args.py

+            help="Whether to return token IDs in the output.",
+        )


This may introduce additional overhead.

)" This reverts commit 35bdb48.

Co-authored-by: Chayenne <zhaochen20@outlook.com>

[Feature] Get Input Token IDs and Output Token IDs with Engine.genera…

c76b1a6

…te()

shuaills requested review from merrymercy, Ying1123, zhyncs and hnyls2002 as code owners December 29, 2024 07:23

shuaills and others added 3 commits December 29, 2024 07:44

Fix token IDs order

9d8ef22

Merge branch 'main' into feature/sgl-project#2634/tokenID

c97c34c

added test to CI

ac80700

zhaochenyang20 requested changes Dec 29, 2024

View reviewed changes

test/srt/test_engine_token_ids.py Outdated Show resolved Hide resolved

test/srt/test_engine_token_ids.py Show resolved Hide resolved

zhaochenyang20 added 2 commits December 29, 2024 00:14

Merge branch 'main' into feature/sgl-project#2634/tokenID

a81b9f9

Merge branch 'main' into feature/sgl-project#2634/tokenID

cde136c

merrymercy requested changes Dec 29, 2024

View reviewed changes

Add support for return token IDs in server arguments and update relat…

555cd06

…ed classes

shuaills requested review from ispobock and ByronHsu as code owners December 29, 2024 18:05

shuaills and others added 2 commits December 29, 2024 18:07

Merge branch 'main' into feature/sgl-project#2634/tokenID

cd6a64e

Add '--return-token-ids' argument to ServerArgs

6c537ec

zhaochenyang20 requested changes Dec 29, 2024

View reviewed changes

zhaochenyang20 and others added 4 commits December 29, 2024 10:40

Merge branch 'main' into feature/sgl-project#2634/tokenID

57e352a

Simplified the test code

2e88952

clean up

885f3e9

clean up

c5fb85e

zhaochenyang20 requested changes Dec 29, 2024

View reviewed changes

clarify potential overhead

a5153eb

zhaochenyang20 merged commit 35bdb48 into sgl-project:main Dec 29, 2024
1 of 14 checks passed

shuaills mentioned this pull request Dec 29, 2024

Fix unittest for input tokens #2646

Merged

shuaills deleted the feature/#2634/tokenID branch December 30, 2024 16:47

shuaills added a commit to shuaills/sglang that referenced this pull request Jan 11, 2025

Revert "[Feature] Get Token IDs with Engine.generate() (sgl-project#2636

119a997

)" This reverts commit 35bdb48.

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

[Feature] Get Token IDs with Engine.generate() (sgl-project#2636)

70f2567

Co-authored-by: Chayenne <zhaochen20@outlook.com>



		class TestEngineTokenIds(unittest.TestCase):
		"""Tests SGLang's token IDs against Hugging Face tokenizer."""

[Feature] Get Token IDs with Engine.generate() #2636

[Feature] Get Token IDs with Engine.generate() #2636

Uh oh!

Conversation

shuaills commented Dec 29, 2024

Motivation

Checklist

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!