-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Frontend language separate reasoning support #6031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frontend language separate reasoning support #6031
Conversation
Looks like |
…ng into lang-reasoning-support
again, notebook-test failed due to flash MLA install, unrelated to this pr |
other = expr.expr | ||
if not other: | ||
return | ||
elif isinstance(other, SglGen) or isinstance(other, SglSelect): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we raise an error or warning if separate_reasoning
is used not with SglGen
and SglSelect
?
if self.stream: | ||
# separate reasoning for stream is not supported | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we raise warnings when users use seperate_reasoning
in stream mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated about it, the idea is to not affect original behavior while also be flexible, I see such pattern a couple of times in sglang code base
Please help confirm whether the newly introduced |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature for separating reasoning content from model responses, particularly for models like Qwen3. The core logic for SglSeparateReasoning
and its integration into the interpreter seems well-thought-out. The addition of a Jupyter notebook example is also very helpful for users to understand how to use this new API.
I've identified a few areas that could be improved for robustness and clarity, particularly around the handling of the expr
argument in the separate_reasoning
API and its propagation to the IR and interpreter. The example notebook also has a critical issue in one of its demonstrations.
No explicit style guide was provided, so my review primarily focuses on correctness, maintainability, and clarity, with general Python best practices (like PEP 8) in mind.
Summary of Findings
- Incorrect Jupyter Notebook Example: The regex generation example in
docs/frontend/frontend_reasoning.ipynb
produces a date instead of an IP address, despite the prompt and regex aiming for an IP. This is a critical issue as it misrepresents the feature's capability or model adherence. - API and IR Type Mismatch for
expr
: Theseparate_reasoning
API inpython/sglang/api.py
allowsexpr
to beOptional[SglExpr]
, potentiallyNone
. However, theSglSeparateReasoning
IR node inpython/sglang/lang/ir.py
expectsexpr: SglExpr
(non-optional). This discrepancy can lead to runtime errors ifNone
is passed. - Handling of Inner Expression Types in
SglSeparateReasoning
: Inpython/sglang/lang/ir.py
,SglSeparateReasoning._process_expr
only setsself.name
if the innerexpr
isSglGen
,SglSelect
, orSglExprList
. For otherSglExpr
types,self.name
remainsNone
, which can cause issues likevariable_event[None]
in the interpreter. - Local Import in Interpreter (Low Severity): In
python/sglang/lang/interpreter.py
,ReasoningParser
is imported locally within_execute_separate_reasoning
. While this might be intentional (e.g., for lazy loading or avoiding circular dependencies), top-level imports are generally preferred for clarity if there's no specific reason for local import. (Not commented inline due to severity settings). - Minor Test File Issues (Low Severity): In
test/lang/test_separate_reasoning_execution.py
, there's a duplicatetearDown
method and a test namedtest_reasoning_parser_integration
that primarily tests the mock parser rather than full integration. (Not commented inline due to severity settings).
Merge Readiness
This pull request introduces an important feature and is mostly well-implemented. However, there are a few critical and high-severity issues that need to be addressed before merging:
- The regex example in the documentation notebook needs to be corrected.
- The handling of the
expr
argument in theseparate_reasoning
API and its consistency with theSglSeparateReasoning
IR node needs to be resolved to prevent potential runtime errors. - The
SglSeparateReasoning
IR node should clarify or robustly handle different types of inner expressions to ensureself.name
is always appropriately set or unhandled types are explicitly disallowed.
Once these issues are addressed, the PR should be in good shape. I am unable to approve pull requests, so please ensure these changes are reviewed and approved by a maintainer before merging.
"@function\n", | ||
"def regular_expression_gen(s):\n", | ||
" s += user(\n", | ||
" \"What is the IP address of the Google DNS servers? just provide the answer\"\n", | ||
" )\n", | ||
" s += assistant(\n", | ||
" separate_reasoning(\n", | ||
" gen(\n", | ||
" \"answer\",\n", | ||
" temperature=0,\n", | ||
" regex=r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?).){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\",\n", | ||
" max_tokens=512,\n", | ||
" ),\n", | ||
" model_type=\"qwen3\",\n", | ||
" ),\n", | ||
" )\n", | ||
"\n", | ||
"\n", | ||
"reasoning_state = regular_expression_gen()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[2025-05-05 17:54:11] Prefill batch. #new-seq: 1, #new-token: 26, #cached-token: 8, token usage: 0.00, #running-req: 0, #queue-req: 0\n", | ||
"[2025-05-05 17:54:11] Decode batch. #running-req: 1, #token: 68, token usage: 0.00, gen throughput (token/s): 47.33, #queue-req: 0\n", | ||
"[2025-05-05 17:54:12] Decode batch. #running-req: 1, #token: 108, token usage: 0.00, gen throughput (token/s): 83.03, #queue-req: 0\n", | ||
"[2025-05-05 17:54:12] Decode batch. #running-req: 1, #token: 148, token usage: 0.00, gen throughput (token/s): 82.51, #queue-req: 0\n", | ||
"[2025-05-05 17:54:13] Decode batch. #running-req: 1, #token: 188, token usage: 0.00, gen throughput (token/s): 82.06, #queue-req: 0\n", | ||
"[2025-05-05 17:54:13] Decode batch. #running-req: 1, #token: 228, token usage: 0.00, gen throughput (token/s): 81.80, #queue-req: 0\n", | ||
"[2025-05-05 17:54:14] Decode batch. #running-req: 1, #token: 268, token usage: 0.00, gen throughput (token/s): 81.48, #queue-req: 0\n", | ||
"[2025-05-05 17:54:14] Decode batch. #running-req: 1, #token: 308, token usage: 0.00, gen throughput (token/s): 81.14, #queue-req: 0\n", | ||
"[2025-05-05 17:54:15] Decode batch. #running-req: 1, #token: 348, token usage: 0.00, gen throughput (token/s): 80.84, #queue-req: 0\n", | ||
"[2025-05-05 17:54:15] INFO: 127.0.0.1:47290 - \"POST /generate HTTP/1.1\" 200 OK\n", | ||
"Answer:\n", | ||
"2023-10-05\n", | ||
"\n", | ||
"\n", | ||
"Reasoning Content:\n", | ||
"Okay, the user is asking for the IP addresses of Google's DNS servers. Let me recall what I know about DNS servers. Google provides two public DNS servers, right? They're commonly used for their reliability and speed.\n", | ||
"\n", | ||
"I think the primary one is 8.8.8.8. Wait, isn't there another one? Oh yeah, 8.8.4.4. Those are the two main ones. Let me make sure I'm not mixing them up with other providers. For example, Cloudflare uses 1.1.1.1 and 1.0.0.1. But Google's are definitely 8.8.8.8 and 8.8.4.4. \n", | ||
"\n", | ||
"I should check if there are any other IP addresses, but I don't think so. They have two main ones. The user might be looking to set up their DNS settings, so providing both is important. Also, maybe mention that they're both in the same range, which is 8.8.0.0/14. But the user just asked for the IP addresses, so maybe just list them. \n", | ||
"\n", | ||
"Wait, the user said \"just provide the answer,\" so maybe they don't need extra info. But to be thorough, I should confirm that those are the correct ones. Let me think if there's any chance of confusion. No, 8.8.8.8 is the primary, and 8.8.4.4 is the secondary. Yeah, that's right. So the answer is those two IPs.\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print_highlight(f\"Answer:\\n{reasoning_state['answer']}\")\n", | ||
"print_highlight(\n", | ||
" f\"\\n\\nReasoning Content:\\n{reasoning_state['answer_reasoning_content']}\"\n", | ||
")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example for regular expression generation seems to have an issue. The user prompt asks for an IP address, and the regex provided is for matching IP addresses. However, the output Answer:
is 2023-10-05
, which is a date, not an IP address. The Reasoning Content
correctly discusses IP addresses (8.8.8.8 and 8.8.4.4).
This discrepancy could be confusing for users and might indicate either:
- The model is not correctly adhering to the regex constraint in this specific case.
- There's an issue with how the regex constraint is applied or how the example is set up.
Could you please investigate this example and ensure the output aligns with the prompt and the regex constraint? If the model struggles with this specific regex, perhaps a different example or a note about potential model limitations would be appropriate.
|
||
|
||
def separate_reasoning( | ||
expr: Optional[SglExpr] = None, model_type: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expr
parameter is typed as Optional[SglExpr] = None
. However, if expr
is None
, SglExprList([None, SglSeparateReasoning(model_type, expr=None)])
will be created.
This can lead to two issues:
- The
SglSeparateReasoning
constructor inir.py
expectsexpr: SglExpr
, notOptional[SglExpr]
. PassingNone
would violate this type hint and likely cause an error in its_process_expr
method when it tries to operate onexpr
. - Even if
SglSeparateReasoning
could handleexpr=None
, theStreamExecutor._execute
method, when processing theSglExprList
, would encounter_execute(None)
. This would likely raise an error as_execute
asserts its input is anSglExpr
(after a string check).
Given the PR description states "Only to be used with sgl gen" (and SglSelect
is also handled), it seems expr
should be mandatory and of a more specific type (e.g., Union[SglGen, SglSelect]
).
Could you clarify if expr=None
is an intended use case? If not, perhaps make expr
a required argument and adjust its type hint accordingly?
expr: Optional[SglExpr] = None, model_type: Optional[str] = None | |
expr: SglExpr, model_type: Optional[str] = None |
|
||
|
||
class SglSeparateReasoning(SglExpr): | ||
def __init__(self, model_type: str, expr: SglExpr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor expects expr: SglExpr
. However, as noted in the comment for python/sglang/api.py
, the separate_reasoning
API function currently allows expr
to be None
. If None
is passed from the API, this constructor would receive expr=None
, which violates the type hint and would likely lead to errors in _process_expr
when it attempts to access attributes of expr
.
This inconsistency should be resolved. If expr
must always be a valid SglExpr
instance, the API layer should enforce this.
def _process_expr(self, expr): | ||
if isinstance(expr, SglGen): | ||
self.name = self.process_name_for_reasoning(expr.name) | ||
elif isinstance(expr, SglSelect): | ||
self.name = self.process_name_for_reasoning(expr.name) | ||
elif isinstance(expr, SglExprList): | ||
for x in expr.expr_list: | ||
self._process_expr(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _process_expr
method correctly handles SglGen
, SglSelect
, and SglExprList
by setting self.name
. However, if the input expr
is of another SglExpr
type (e.g., SglConstantText
), self.name
will remain None
(its initialized value).
Later, in StreamExecutor._init_var_event
, when an SglSeparateReasoning
instance is processed, self.variable_event[expr.name]
is accessed. If expr.name
(which is self.name
of the SglSeparateReasoning
instance) is None
, this would result in self.variable_event[None] = threading.Event()
, which is problematic.
Should SglSeparateReasoning
only support SglGen
, SglSelect
, or SglExprList
of these as its inner expression? If so, perhaps the type hint for expr
in the constructor should be more specific, or an error should be raised here if an unsupported expr
type is provided. Alternatively, if other types are permissible but don't have a 'name' to derive the reasoning variable name from, how should that be handled?
@JustinTong0323 are you taking over the review? or you are just testing out the review bot? I can address some of the reviews (some are correct some are irrelevant) |
Motivation
I would like to be able to interpret response from reasoning models such as qwen3 and
Modifications
Added a
SglSeparateReasoning
expression, used by theseparate_resoning
API. It will detect and parse the text generated in theSglGen
expression and move the reasoning content to[param]_reasoning_content
.Does not support streaming. Only to be used with sgl gen.
Example:
Conversation:
Checklist