-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
1. Overview
Through refactoring OpenAI-Compatible Server with @CatherineSue. We found that with SGLang v0.4.7.post1, requests including a custom rid
field fail under two specific conditions:
-
When parameter
n > 1
in the/v1/chat/completions
endpoint. -
When the input to
/v1/embeddings
is a list of strings.
Current issue oriented from inconsistencies between the server's adapter
and the low-level TokenizerManager
. Specifically, the adapter currently forwards rid
as a single string, whereas the internal batching logic expects rid
as a list of strings matching the batch size.
We temporarily addressed this issue by disabling rid
handling in both serving_base
and client payloads. However, a comprehensive fix is required, involving adjustments in tokenizer_manager.py
and io_struct.py
. That's why we open this issue track for future fix.
Additionally, within openai_api/protocol.py, only a single string is accepted for rid
. This design choice originally aimed to serve enterprise clients who use internal correlation IDs (e.g., X-Request-ID
) for tracking requests via logs and metrics. However, during batch processing, TokenizerManager
inherently expects a List[str]
for rid
, leading to runtime crashes when a scalar is mistakenly treated as a list.
2. Expected Behavior
-
rid
should function purely as an opaque correlation ID, analogous to headers such asX-OpenAI-Request-ID
from the official API, without affecting batch logic. -
Both
/v1/chat/completions
and/v1/embeddings
endpoints should function correctly regardless of the value ofn
or the length of the input.
3. Temporary Workaround
Disable rid
in client payloads. Internal testing confirms that removing this field restores normal endpoint functionality.
4. Protential Fixes
Approach | Solution | Pros | Cons |
---|---|---|---|
A | Adapter broadcasts scalar rid to a list of identical values when batch_size > 1 . |
Easy fix; preserves external contract. | Loses per-choice ID granularity. |
B | Adapter accepts both str and List[str] ; validates length matches batch_size ; else returns 400 . |
Strict; supports unique per-choice IDs. | Breaking change for invalid input callers. |
C | Remove rid from public schema until server refactor completes. |
Eliminates ambiguity; buys redesign time. | May break existing clients using rid . |
D | Adopt vLLM’s extra_body pattern: move non-OpenAI params (e.g., rid ) into a dedicated sub-dict. |
Future-proof; avoids OpenAI spec collisions. | Requires client changes and deeper refactoring. |
Reproduction
Running test_embedding_openai_server.py
with SGLang v0.4.7.post1 resulted in the following error before:
Environment
Under lmsysorg/sglang:v0.4.7.post1-cu124