Skip to content

Conversation

binarycrayon
Copy link
Contributor

@binarycrayon binarycrayon commented May 5, 2025

Motivation

I would like to be able to interpret response from reasoning models such as qwen3 and

Modifications

Added a SglSeparateReasoning expression, used by the separate_resoning API. It will detect and parse the text generated in the SglGen expression and move the reasoning content to [param]_reasoning_content.

Does not support streaming. Only to be used with sgl gen.

Example:

@function
def basic_qa_separate_reasoning(s, question):
    s += system(f"You are a helpful assistant than can answer questions.")
    s += user(question)
    s += assistant_begin()
    s += separate_reasoning(gen("answer", max_tokens=512), model_type="qwen3")
    s += assistant_end()


reasoning_state = basic_qa_separate_reasoning("List 3 countries and their capitals.")
print_highlight(reasoning_state.stream_executor.variable_event.keys())
print_highlight(
    f"\nSeparated Reasoning Content:\n{reasoning_state["answer_reasoning_content"]}"
)
print_highlight(f"\n\nContent:\n{reasoning_state["answer"]}")
print_highlight(f"\n\nMessages:\n{reasoning_state.messages()[-1]}")

Conversation:

@function
def multi_turn_qa(s):
    s += system(f"You are a helpful assistant than can answer questions.")
    s += user("Please give me a list of 3 countries and their capitals.")
    s += assistant(
        separate_reasoning(gen("first_answer", max_tokens=512), model_type="qwen3")
    )
    s += user("Please give me another list of 3 countries and their capitals.")
    s += assistant(
        separate_reasoning(gen("second_answer", max_tokens=512), model_type="qwen3")
    )
    return s


reasoning_state = multi_turn_qa()
print_highlight(f"\n\nfirst_answer:\n{reasoning_state['first_answer']}")
print_highlight(
    f"\n\nfirst_answer_reasoning_content:\n{reasoning_state['first_answer_reasoning_content']}"
)
print_highlight(f"\n\nsecond_answer:\n{reasoning_state['second_answer']}")
print_highlight(
    f"\n\nsecond_answer_reasoning_content:\n{reasoning_state['second_answer_reasoning_content']}"
)

Checklist

@binarycrayon binarycrayon changed the title Lang reasoning support Frontend language separate reasoning support May 5, 2025
@binarycrayon
Copy link
Contributor Author

Looks like unit-test-frontend has timeout error in setup, not sure how I can fix it

@binarycrayon
Copy link
Contributor Author

again, notebook-test failed due to flash MLA install, unrelated to this pr

other = expr.expr
if not other:
return
elif isinstance(other, SglGen) or isinstance(other, SglSelect):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we raise an error or warning if separate_reasoning is used not with SglGen and SglSelect?

Comment on lines +731 to +733
if self.stream:
# separate reasoning for stream is not supported
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we raise warnings when users use seperate_reasoning in stream mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated about it, the idea is to not affect original behavior while also be flexible, I see such pattern a couple of times in sglang code base

@shanyu-sys
Copy link
Collaborator

@hnyls2002

Please help confirm whether the newly introduced separate_reasoning API is okay. And Please also review this PR before merging it.

@JustinTong0323
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for separating reasoning content from model responses, particularly for models like Qwen3. The core logic for SglSeparateReasoning and its integration into the interpreter seems well-thought-out. The addition of a Jupyter notebook example is also very helpful for users to understand how to use this new API.

I've identified a few areas that could be improved for robustness and clarity, particularly around the handling of the expr argument in the separate_reasoning API and its propagation to the IR and interpreter. The example notebook also has a critical issue in one of its demonstrations.

No explicit style guide was provided, so my review primarily focuses on correctness, maintainability, and clarity, with general Python best practices (like PEP 8) in mind.

Summary of Findings

  • Incorrect Jupyter Notebook Example: The regex generation example in docs/frontend/frontend_reasoning.ipynb produces a date instead of an IP address, despite the prompt and regex aiming for an IP. This is a critical issue as it misrepresents the feature's capability or model adherence.
  • API and IR Type Mismatch for expr: The separate_reasoning API in python/sglang/api.py allows expr to be Optional[SglExpr], potentially None. However, the SglSeparateReasoning IR node in python/sglang/lang/ir.py expects expr: SglExpr (non-optional). This discrepancy can lead to runtime errors if None is passed.
  • Handling of Inner Expression Types in SglSeparateReasoning: In python/sglang/lang/ir.py, SglSeparateReasoning._process_expr only sets self.name if the inner expr is SglGen, SglSelect, or SglExprList. For other SglExpr types, self.name remains None, which can cause issues like variable_event[None] in the interpreter.
  • Local Import in Interpreter (Low Severity): In python/sglang/lang/interpreter.py, ReasoningParser is imported locally within _execute_separate_reasoning. While this might be intentional (e.g., for lazy loading or avoiding circular dependencies), top-level imports are generally preferred for clarity if there's no specific reason for local import. (Not commented inline due to severity settings).
  • Minor Test File Issues (Low Severity): In test/lang/test_separate_reasoning_execution.py, there's a duplicate tearDown method and a test named test_reasoning_parser_integration that primarily tests the mock parser rather than full integration. (Not commented inline due to severity settings).

Merge Readiness

This pull request introduces an important feature and is mostly well-implemented. However, there are a few critical and high-severity issues that need to be addressed before merging:

  1. The regex example in the documentation notebook needs to be corrected.
  2. The handling of the expr argument in the separate_reasoning API and its consistency with the SglSeparateReasoning IR node needs to be resolved to prevent potential runtime errors.
  3. The SglSeparateReasoning IR node should clarify or robustly handle different types of inner expressions to ensure self.name is always appropriately set or unhandled types are explicitly disallowed.

Once these issues are addressed, the PR should be in good shape. I am unable to approve pull requests, so please ensure these changes are reviewed and approved by a maintainer before merging.

Comment on lines +401 to +461
"@function\n",
"def regular_expression_gen(s):\n",
" s += user(\n",
" \"What is the IP address of the Google DNS servers? just provide the answer\"\n",
" )\n",
" s += assistant(\n",
" separate_reasoning(\n",
" gen(\n",
" \"answer\",\n",
" temperature=0,\n",
" regex=r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?).){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\",\n",
" max_tokens=512,\n",
" ),\n",
" model_type=\"qwen3\",\n",
" ),\n",
" )\n",
"\n",
"\n",
"reasoning_state = regular_expression_gen()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2025-05-05 17:54:11] Prefill batch. #new-seq: 1, #new-token: 26, #cached-token: 8, token usage: 0.00, #running-req: 0, #queue-req: 0\n",
"[2025-05-05 17:54:11] Decode batch. #running-req: 1, #token: 68, token usage: 0.00, gen throughput (token/s): 47.33, #queue-req: 0\n",
"[2025-05-05 17:54:12] Decode batch. #running-req: 1, #token: 108, token usage: 0.00, gen throughput (token/s): 83.03, #queue-req: 0\n",
"[2025-05-05 17:54:12] Decode batch. #running-req: 1, #token: 148, token usage: 0.00, gen throughput (token/s): 82.51, #queue-req: 0\n",
"[2025-05-05 17:54:13] Decode batch. #running-req: 1, #token: 188, token usage: 0.00, gen throughput (token/s): 82.06, #queue-req: 0\n",
"[2025-05-05 17:54:13] Decode batch. #running-req: 1, #token: 228, token usage: 0.00, gen throughput (token/s): 81.80, #queue-req: 0\n",
"[2025-05-05 17:54:14] Decode batch. #running-req: 1, #token: 268, token usage: 0.00, gen throughput (token/s): 81.48, #queue-req: 0\n",
"[2025-05-05 17:54:14] Decode batch. #running-req: 1, #token: 308, token usage: 0.00, gen throughput (token/s): 81.14, #queue-req: 0\n",
"[2025-05-05 17:54:15] Decode batch. #running-req: 1, #token: 348, token usage: 0.00, gen throughput (token/s): 80.84, #queue-req: 0\n",
"[2025-05-05 17:54:15] INFO: 127.0.0.1:47290 - \"POST /generate HTTP/1.1\" 200 OK\n",
"Answer:\n",
"2023-10-05\n",
"\n",
"\n",
"Reasoning Content:\n",
"Okay, the user is asking for the IP addresses of Google's DNS servers. Let me recall what I know about DNS servers. Google provides two public DNS servers, right? They're commonly used for their reliability and speed.\n",
"\n",
"I think the primary one is 8.8.8.8. Wait, isn't there another one? Oh yeah, 8.8.4.4. Those are the two main ones. Let me make sure I'm not mixing them up with other providers. For example, Cloudflare uses 1.1.1.1 and 1.0.0.1. But Google's are definitely 8.8.8.8 and 8.8.4.4. \n",
"\n",
"I should check if there are any other IP addresses, but I don't think so. They have two main ones. The user might be looking to set up their DNS settings, so providing both is important. Also, maybe mention that they're both in the same range, which is 8.8.0.0/14. But the user just asked for the IP addresses, so maybe just list them. \n",
"\n",
"Wait, the user said \"just provide the answer,\" so maybe they don't need extra info. But to be thorough, I should confirm that those are the correct ones. Let me think if there's any chance of confusion. No, 8.8.8.8 is the primary, and 8.8.4.4 is the secondary. Yeah, that's right. So the answer is those two IPs.\n",
"\n"
]
}
],
"source": [
"print_highlight(f\"Answer:\\n{reasoning_state['answer']}\")\n",
"print_highlight(\n",
" f\"\\n\\nReasoning Content:\\n{reasoning_state['answer_reasoning_content']}\"\n",
")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The example for regular expression generation seems to have an issue. The user prompt asks for an IP address, and the regex provided is for matching IP addresses. However, the output Answer: is 2023-10-05, which is a date, not an IP address. The Reasoning Content correctly discusses IP addresses (8.8.8.8 and 8.8.4.4).

This discrepancy could be confusing for users and might indicate either:

  1. The model is not correctly adhering to the regex constraint in this specific case.
  2. There's an issue with how the regex constraint is applied or how the example is set up.

Could you please investigate this example and ensure the output aligns with the prompt and the regex constraint? If the model struggles with this specific regex, perhaps a different example or a note about potential model limitations would be appropriate.



def separate_reasoning(
expr: Optional[SglExpr] = None, model_type: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expr parameter is typed as Optional[SglExpr] = None. However, if expr is None, SglExprList([None, SglSeparateReasoning(model_type, expr=None)]) will be created.

This can lead to two issues:

  1. The SglSeparateReasoning constructor in ir.py expects expr: SglExpr, not Optional[SglExpr]. Passing None would violate this type hint and likely cause an error in its _process_expr method when it tries to operate on expr.
  2. Even if SglSeparateReasoning could handle expr=None, the StreamExecutor._execute method, when processing the SglExprList, would encounter _execute(None). This would likely raise an error as _execute asserts its input is an SglExpr (after a string check).

Given the PR description states "Only to be used with sgl gen" (and SglSelect is also handled), it seems expr should be mandatory and of a more specific type (e.g., Union[SglGen, SglSelect]).

Could you clarify if expr=None is an intended use case? If not, perhaps make expr a required argument and adjust its type hint accordingly?

Suggested change
expr: Optional[SglExpr] = None, model_type: Optional[str] = None
expr: SglExpr, model_type: Optional[str] = None



class SglSeparateReasoning(SglExpr):
def __init__(self, model_type: str, expr: SglExpr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The constructor expects expr: SglExpr. However, as noted in the comment for python/sglang/api.py, the separate_reasoning API function currently allows expr to be None. If None is passed from the API, this constructor would receive expr=None, which violates the type hint and would likely lead to errors in _process_expr when it attempts to access attributes of expr.

This inconsistency should be resolved. If expr must always be a valid SglExpr instance, the API layer should enforce this.

Comment on lines +625 to +632
def _process_expr(self, expr):
if isinstance(expr, SglGen):
self.name = self.process_name_for_reasoning(expr.name)
elif isinstance(expr, SglSelect):
self.name = self.process_name_for_reasoning(expr.name)
elif isinstance(expr, SglExprList):
for x in expr.expr_list:
self._process_expr(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _process_expr method correctly handles SglGen, SglSelect, and SglExprList by setting self.name. However, if the input expr is of another SglExpr type (e.g., SglConstantText), self.name will remain None (its initialized value).

Later, in StreamExecutor._init_var_event, when an SglSeparateReasoning instance is processed, self.variable_event[expr.name] is accessed. If expr.name (which is self.name of the SglSeparateReasoning instance) is None, this would result in self.variable_event[None] = threading.Event(), which is problematic.

Should SglSeparateReasoning only support SglGen, SglSelect, or SglExprList of these as its inner expression? If so, perhaps the type hint for expr in the constructor should be more specific, or an error should be raised here if an unsupported expr type is provided. Alternatively, if other types are permissible but don't have a 'name' to derive the reasoning variable name from, how should that be handled?

@binarycrayon
Copy link
Contributor Author

binarycrayon commented May 30, 2025

@JustinTong0323 are you taking over the review? or you are just testing out the review bot? I can address some of the reviews (some are correct some are irrelevant)

@binarycrayon
Copy link
Contributor Author

image
should be acceptable, failed cases were timeouts.

Let me update branch again

@merrymercy merrymercy merged commit 14c18d2 into sgl-project:main Jun 11, 2025
46 of 53 checks passed
almaslof pushed a commit to mpashkovskii/sglang that referenced this pull request Jun 11, 2025
jianan-gu pushed a commit to jianan-gu/sglang that referenced this pull request Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants