Frontend language separate reasoning support #6031

binarycrayon · 2025-05-05T17:17:56Z

Motivation

I would like to be able to interpret response from reasoning models such as qwen3 and

Modifications

Added a SglSeparateReasoning expression, used by the separate_resoning API. It will detect and parse the text generated in the SglGen expression and move the reasoning content to [param]_reasoning_content.

Does not support streaming. Only to be used with sgl gen.

Example:

@function
def basic_qa_separate_reasoning(s, question):
    s += system(f"You are a helpful assistant than can answer questions.")
    s += user(question)
    s += assistant_begin()
    s += separate_reasoning(gen("answer", max_tokens=512), model_type="qwen3")
    s += assistant_end()


reasoning_state = basic_qa_separate_reasoning("List 3 countries and their capitals.")
print_highlight(reasoning_state.stream_executor.variable_event.keys())
print_highlight(
    f"\nSeparated Reasoning Content:\n{reasoning_state["answer_reasoning_content"]}"
)
print_highlight(f"\n\nContent:\n{reasoning_state["answer"]}")
print_highlight(f"\n\nMessages:\n{reasoning_state.messages()[-1]}")

Conversation:

@function
def multi_turn_qa(s):
    s += system(f"You are a helpful assistant than can answer questions.")
    s += user("Please give me a list of 3 countries and their capitals.")
    s += assistant(
        separate_reasoning(gen("first_answer", max_tokens=512), model_type="qwen3")
    )
    s += user("Please give me another list of 3 countries and their capitals.")
    s += assistant(
        separate_reasoning(gen("second_answer", max_tokens=512), model_type="qwen3")
    )
    return s


reasoning_state = multi_turn_qa()
print_highlight(f"\n\nfirst_answer:\n{reasoning_state['first_answer']}")
print_highlight(
    f"\n\nfirst_answer_reasoning_content:\n{reasoning_state['first_answer_reasoning_content']}"
)
print_highlight(f"\n\nsecond_answer:\n{reasoning_state['second_answer']}")
print_highlight(
    f"\n\nsecond_answer_reasoning_content:\n{reasoning_state['second_answer_reasoning_content']}"
)

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

binarycrayon · 2025-05-05T17:44:13Z

Looks like unit-test-frontend has timeout error in setup, not sure how I can fix it

…ng into lang-reasoning-support

binarycrayon · 2025-05-09T23:32:53Z

again, notebook-test failed due to flash MLA install, unrelated to this pr

test/lang/test_separate_reasoning_execution.py

shanyu-sys · 2025-05-10T06:37:00Z

python/sglang/lang/interpreter.py

+        other = expr.expr
+        if not other:
+            return
+        elif isinstance(other, SglGen) or isinstance(other, SglSelect):


Shall we raise an error or warning if separate_reasoning is used not with SglGen and SglSelect?

shanyu-sys · 2025-05-10T06:37:50Z

python/sglang/lang/interpreter.py

+        if self.stream:
+            # separate reasoning for stream is not supported
+            return


Shall we raise warnings when users use seperate_reasoning in stream mode?

I debated about it, the idea is to not affect original behavior while also be flexible, I see such pattern a couple of times in sglang code base

shanyu-sys · 2025-05-10T06:39:42Z

@hnyls2002

Please help confirm whether the newly introduced separate_reasoning API is okay. And Please also review this PR before merging it.

JustinTong0323 · 2025-05-29T20:43:16Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a valuable feature for separating reasoning content from model responses, particularly for models like Qwen3. The core logic for SglSeparateReasoning and its integration into the interpreter seems well-thought-out. The addition of a Jupyter notebook example is also very helpful for users to understand how to use this new API.

I've identified a few areas that could be improved for robustness and clarity, particularly around the handling of the expr argument in the separate_reasoning API and its propagation to the IR and interpreter. The example notebook also has a critical issue in one of its demonstrations.

No explicit style guide was provided, so my review primarily focuses on correctness, maintainability, and clarity, with general Python best practices (like PEP 8) in mind.

Summary of Findings

Incorrect Jupyter Notebook Example: The regex generation example in docs/frontend/frontend_reasoning.ipynb produces a date instead of an IP address, despite the prompt and regex aiming for an IP. This is a critical issue as it misrepresents the feature's capability or model adherence.
API and IR Type Mismatch for expr: The separate_reasoning API in python/sglang/api.py allows expr to be Optional[SglExpr], potentially None. However, the SglSeparateReasoning IR node in python/sglang/lang/ir.py expects expr: SglExpr (non-optional). This discrepancy can lead to runtime errors if None is passed.
Handling of Inner Expression Types in SglSeparateReasoning: In python/sglang/lang/ir.py, SglSeparateReasoning._process_expr only sets self.name if the inner expr is SglGen, SglSelect, or SglExprList. For other SglExpr types, self.name remains None, which can cause issues like variable_event[None] in the interpreter.
Local Import in Interpreter (Low Severity): In python/sglang/lang/interpreter.py, ReasoningParser is imported locally within _execute_separate_reasoning. While this might be intentional (e.g., for lazy loading or avoiding circular dependencies), top-level imports are generally preferred for clarity if there's no specific reason for local import. (Not commented inline due to severity settings).
Minor Test File Issues (Low Severity): In test/lang/test_separate_reasoning_execution.py, there's a duplicate tearDown method and a test named test_reasoning_parser_integration that primarily tests the mock parser rather than full integration. (Not commented inline due to severity settings).

Merge Readiness

This pull request introduces an important feature and is mostly well-implemented. However, there are a few critical and high-severity issues that need to be addressed before merging:

The regex example in the documentation notebook needs to be corrected.
The handling of the expr argument in the separate_reasoning API and its consistency with the SglSeparateReasoning IR node needs to be resolved to prevent potential runtime errors.
The SglSeparateReasoning IR node should clarify or robustly handle different types of inner expressions to ensure self.name is always appropriately set or unhandled types are explicitly disallowed.

Once these issues are addressed, the PR should be in good shape. I am unable to approve pull requests, so please ensure these changes are reviewed and approved by a maintainer before merging.

gemini-code-assist · 2025-05-29T20:44:39Z

docs/frontend/frontend_reasoning.ipynb

+    "@function\n",
+    "def regular_expression_gen(s):\n",
+    "    s += user(\n",
+    "        \"What is the IP address of the Google DNS servers? just provide the answer\"\n",
+    "    )\n",
+    "    s += assistant(\n",
+    "        separate_reasoning(\n",
+    "            gen(\n",
+    "                \"answer\",\n",
+    "                temperature=0,\n",
+    "                regex=r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?).){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\",\n",
+    "                max_tokens=512,\n",
+    "            ),\n",
+    "            model_type=\"qwen3\",\n",
+    "        ),\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "reasoning_state = regular_expression_gen()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2025-05-05 17:54:11] Prefill batch. #new-seq: 1, #new-token: 26, #cached-token: 8, token usage: 0.00, #running-req: 0, #queue-req: 0\n",
+      "[2025-05-05 17:54:11] Decode batch. #running-req: 1, #token: 68, token usage: 0.00, gen throughput (token/s): 47.33, #queue-req: 0\n",
+      "[2025-05-05 17:54:12] Decode batch. #running-req: 1, #token: 108, token usage: 0.00, gen throughput (token/s): 83.03, #queue-req: 0\n",
+      "[2025-05-05 17:54:12] Decode batch. #running-req: 1, #token: 148, token usage: 0.00, gen throughput (token/s): 82.51, #queue-req: 0\n",
+      "[2025-05-05 17:54:13] Decode batch. #running-req: 1, #token: 188, token usage: 0.00, gen throughput (token/s): 82.06, #queue-req: 0\n",
+      "[2025-05-05 17:54:13] Decode batch. #running-req: 1, #token: 228, token usage: 0.00, gen throughput (token/s): 81.80, #queue-req: 0\n",
+      "[2025-05-05 17:54:14] Decode batch. #running-req: 1, #token: 268, token usage: 0.00, gen throughput (token/s): 81.48, #queue-req: 0\n",
+      "[2025-05-05 17:54:14] Decode batch. #running-req: 1, #token: 308, token usage: 0.00, gen throughput (token/s): 81.14, #queue-req: 0\n",
+      "[2025-05-05 17:54:15] Decode batch. #running-req: 1, #token: 348, token usage: 0.00, gen throughput (token/s): 80.84, #queue-req: 0\n",
+      "[2025-05-05 17:54:15] INFO:     127.0.0.1:47290 - \"POST /generate HTTP/1.1\" 200 OK\n",
+      "Answer:\n",
+      "2023-10-05\n",
+      "\n",
+      "\n",
+      "Reasoning Content:\n",
+      "Okay, the user is asking for the IP addresses of Google's DNS servers. Let me recall what I know about DNS servers. Google provides two public DNS servers, right? They're commonly used for their reliability and speed.\n",
+      "\n",
+      "I think the primary one is 8.8.8.8. Wait, isn't there another one? Oh yeah, 8.8.4.4. Those are the two main ones. Let me make sure I'm not mixing them up with other providers. For example, Cloudflare uses 1.1.1.1 and 1.0.0.1. But Google's are definitely 8.8.8.8 and 8.8.4.4. \n",
+      "\n",
+      "I should check if there are any other IP addresses, but I don't think so. They have two main ones. The user might be looking to set up their DNS settings, so providing both is important. Also, maybe mention that they're both in the same range, which is 8.8.0.0/14. But the user just asked for the IP addresses, so maybe just list them. \n",
+      "\n",
+      "Wait, the user said \"just provide the answer,\" so maybe they don't need extra info. But to be thorough, I should confirm that those are the correct ones. Let me think if there's any chance of confusion. No, 8.8.8.8 is the primary, and 8.8.4.4 is the secondary. Yeah, that's right. So the answer is those two IPs.\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print_highlight(f\"Answer:\\n{reasoning_state['answer']}\")\n",
+    "print_highlight(\n",
+    "    f\"\\n\\nReasoning Content:\\n{reasoning_state['answer_reasoning_content']}\"\n",
+    ")"


The example for regular expression generation seems to have an issue. The user prompt asks for an IP address, and the regex provided is for matching IP addresses. However, the output Answer: is 2023-10-05, which is a date, not an IP address. The Reasoning Content correctly discusses IP addresses (8.8.8.8 and 8.8.4.4).

This discrepancy could be confusing for users and might indicate either:

The model is not correctly adhering to the regex constraint in this specific case.

There's an issue with how the regex constraint is applied or how the example is set up.

Could you please investigate this example and ensure the output aligns with the prompt and the regex constraint? If the model struggles with this specific regex, perhaps a different example or a note about potential model limitations would be appropriate.

gemini-code-assist · 2025-05-29T20:44:39Z

python/sglang/api.py

+
+
+def separate_reasoning(
+    expr: Optional[SglExpr] = None, model_type: Optional[str] = None


The expr parameter is typed as Optional[SglExpr] = None. However, if expr is None, SglExprList([None, SglSeparateReasoning(model_type, expr=None)]) will be created.

This can lead to two issues:

The SglSeparateReasoning constructor in ir.py expects expr: SglExpr, not Optional[SglExpr]. Passing None would violate this type hint and likely cause an error in its _process_expr method when it tries to operate on expr.

Even if SglSeparateReasoning could handle expr=None, the StreamExecutor._execute method, when processing the SglExprList, would encounter _execute(None). This would likely raise an error as _execute asserts its input is an SglExpr (after a string check).

Given the PR description states "Only to be used with sgl gen" (and SglSelect is also handled), it seems expr should be mandatory and of a more specific type (e.g., Union[SglGen, SglSelect]).

Could you clarify if expr=None is an intended use case? If not, perhaps make expr a required argument and adjust its type hint accordingly?

Suggested change

expr: Optional[SglExpr] = None, model_type: Optional[str] = None

expr: SglExpr, model_type: Optional[str] = None

gemini-code-assist · 2025-05-29T20:44:39Z

python/sglang/lang/ir.py

+
+
+class SglSeparateReasoning(SglExpr):
+    def __init__(self, model_type: str, expr: SglExpr):


The constructor expects expr: SglExpr. However, as noted in the comment for python/sglang/api.py, the separate_reasoning API function currently allows expr to be None. If None is passed from the API, this constructor would receive expr=None, which violates the type hint and would likely lead to errors in _process_expr when it attempts to access attributes of expr.

This inconsistency should be resolved. If expr must always be a valid SglExpr instance, the API layer should enforce this.

gemini-code-assist · 2025-05-29T20:44:39Z

python/sglang/lang/ir.py

+    def _process_expr(self, expr):
+        if isinstance(expr, SglGen):
+            self.name = self.process_name_for_reasoning(expr.name)
+        elif isinstance(expr, SglSelect):
+            self.name = self.process_name_for_reasoning(expr.name)
+        elif isinstance(expr, SglExprList):
+            for x in expr.expr_list:
+                self._process_expr(x)


The _process_expr method correctly handles SglGen, SglSelect, and SglExprList by setting self.name. However, if the input expr is of another SglExpr type (e.g., SglConstantText), self.name will remain None (its initialized value).

Later, in StreamExecutor._init_var_event, when an SglSeparateReasoning instance is processed, self.variable_event[expr.name] is accessed. If expr.name (which is self.name of the SglSeparateReasoning instance) is None, this would result in self.variable_event[None] = threading.Event(), which is problematic.

Should SglSeparateReasoning only support SglGen, SglSelect, or SglExprList of these as its inner expression? If so, perhaps the type hint for expr in the constructor should be more specific, or an error should be raised here if an unsupported expr type is provided. Alternatively, if other types are permissible but don't have a 'name' to derive the reasoning variable name from, how should that be handled?

binarycrayon · 2025-05-30T17:04:54Z

@JustinTong0323 are you taking over the review? or you are just testing out the review bot? I can address some of the reviews (some are correct some are irrelevant)

binarycrayon · 2025-06-10T16:33:42Z

should be acceptable, failed cases were timeouts.

Let me update branch again

binarycrayon added 4 commits May 5, 2025 16:54

Add separate reasoning expression to lang

475af90

Add unit tests

785f9c4

Update frontend_reasoning notebook

dab1fb8

Code formatting

4d8f571

binarycrayon requested review from merrymercy, Ying1123, ByronHsu, hnyls2002 and zhaochenyang20 as code owners May 5, 2025 17:17

binarycrayon changed the title ~~Lang reasoning support~~ Frontend language separate reasoning support May 5, 2025

Merge branch 'main' into lang-reasoning-support

e4c78c3

binarycrayon and others added 3 commits May 5, 2025 18:00

Adjust SglSeparateReasoning args

b6a7cc3

Merge branch 'lang-reasoning-support' of github.com:binarycrayon/sgla…

82e6190

…ng into lang-reasoning-support

Merge branch 'main' into lang-reasoning-support

6720b46

zhyncs assigned shanyu-sys May 7, 2025

binarycrayon added 5 commits May 9, 2025 10:58

Merge branch 'main' into lang-reasoning-support

dda9bdf

Update documentation to include frontend reasoning notebook

563c776

Merge branch 'main' into lang-reasoning-support

f06fafa

Fix quotes in notebook

8970d93

Merge branch 'lang-reasoning-support' of github.com:binarycrayon/sgla…

7e04a27

…ng into lang-reasoning-support

shanyu-sys reviewed May 10, 2025

View reviewed changes

shanyu-sys assigned hnyls2002 May 10, 2025

binarycrayon and others added 5 commits May 11, 2025 23:14

Remove time sleep

28dd953

Merge branch 'main' into lang-reasoning-support

7a900bd

Merge branch 'main' into lang-reasoning-support

aa4683b

Merge branch 'main' into lang-reasoning-support

2e83817

Merge branch 'main' into lang-reasoning-support

7954ed3

Merge branch 'main' into lang-reasoning-support

2e0ab76

gemini-code-assist bot suggested changes May 29, 2025

View reviewed changes

Merge branch 'main' into lang-reasoning-support

54a6f0f

binarycrayon and others added 2 commits June 10, 2025 09:33

Merge branch 'main' into lang-reasoning-support

241df30

Merge branch 'main' into lang-reasoning-support

26d843d

shanyu-sys approved these changes Jun 10, 2025

View reviewed changes

merrymercy merged commit 14c18d2 into sgl-project:main Jun 11, 2025
46 of 53 checks passed

almaslof pushed a commit to mpashkovskii/sglang that referenced this pull request Jun 11, 2025

Frontend language separate reasoning support (sgl-project#6031)

41b0736

jianan-gu pushed a commit to jianan-gu/sglang that referenced this pull request Jun 12, 2025

Frontend language separate reasoning support (sgl-project#6031)

9459e98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Frontend language separate reasoning support #6031

Frontend language separate reasoning support #6031

Uh oh!

binarycrayon commented May 5, 2025 •

edited

Loading

Uh oh!

binarycrayon commented May 5, 2025

Uh oh!

binarycrayon commented May 9, 2025

Uh oh!

Uh oh!

shanyu-sys May 10, 2025

Uh oh!

shanyu-sys May 10, 2025

Uh oh!

binarycrayon May 11, 2025

Uh oh!

shanyu-sys commented May 10, 2025

Uh oh!

JustinTong0323 commented May 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot May 29, 2025

Uh oh!

gemini-code-assist bot May 29, 2025

Uh oh!

gemini-code-assist bot May 29, 2025

Uh oh!

gemini-code-assist bot May 29, 2025

Uh oh!

binarycrayon commented May 30, 2025 •

edited

Loading

Uh oh!

binarycrayon commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!



		def separate_reasoning(
		expr: Optional[SglExpr] = None, model_type: Optional[str] = None

	expr: Optional[SglExpr] = None, model_type: Optional[str] = None
	expr: SglExpr, model_type: Optional[str] = None



		class SglSeparateReasoning(SglExpr):
		def __init__(self, model_type: str, expr: SglExpr):

Frontend language separate reasoning support #6031

Frontend language separate reasoning support #6031

Uh oh!

Conversation

binarycrayon commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

binarycrayon commented May 5, 2025

Uh oh!

binarycrayon commented May 9, 2025

Uh oh!

Uh oh!

shanyu-sys May 10, 2025

Choose a reason for hiding this comment

Uh oh!

shanyu-sys May 10, 2025

Choose a reason for hiding this comment

Uh oh!

binarycrayon May 11, 2025

Choose a reason for hiding this comment

Uh oh!

shanyu-sys commented May 10, 2025

Uh oh!

JustinTong0323 commented May 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot May 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 29, 2025

Choose a reason for hiding this comment

Uh oh!

binarycrayon commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

binarycrayon commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

binarycrayon commented May 5, 2025 •

edited

Loading

binarycrayon commented May 30, 2025 •

edited

Loading