[Bug] Inconsistent rid handling in OpenAI-Compatible Server

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

**1. Overview**  

Through refactoring OpenAI-Compatible Server with @CatherineSue. We found that with SGLang v0.4.7.post1, requests including a custom `rid` field fail under two specific conditions:

- When parameter `n > 1` in the `/v1/chat/completions` endpoint.

- When the input to `/v1/embeddings` is a list of strings.

Current issue oriented from inconsistencies between the server's `adapter` and the low-level `TokenizerManager`. Specifically, the adapter currently forwards `rid` as a single string, whereas the internal batching logic expects `rid` as a list of strings matching the batch size.

We temporarily addressed this issue by disabling `rid` handling in both `serving_base` and client payloads. However, a comprehensive fix is required, involving adjustments in `tokenizer_manager.py` and `io_struct.py`. That's why we open this issue track for future fix.

Additionally, within [openai_api/protocol.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/openai_api/protocol.py#L412), only a single string is accepted for `rid`. This design choice originally aimed to serve enterprise clients who use internal correlation IDs (e.g., `X-Request-ID`) for tracking requests via logs and metrics. However, during batch processing, `TokenizerManager` inherently expects a `List[str]` for `rid`, leading to runtime crashes when a scalar is mistakenly treated as a list.  

---

**2. Expected Behavior**  

- `rid` should function purely as an _opaque correlation ID_, analogous to headers such as `X-OpenAI-Request-ID` from the official API, without affecting batch logic.

- Both `/v1/chat/completions` and `/v1/embeddings` endpoints should function correctly regardless of the value of `n` or the length of the input.

---

**3. Temporary Workaround**  

Disable `rid` in client payloads. Internal testing confirms that removing this field restores normal endpoint functionality.

---

**4. Protential Fixes**  

| Approach | Solution | Pros | Cons |  
|----------|----------|------|------|  
| **A**    | Adapter broadcasts scalar `rid` to a list of identical values when `batch_size > 1`. | Easy fix; preserves external contract. | Loses per-choice ID granularity. |  
| **B**    | Adapter accepts both `str` and `List[str]`; validates length matches `batch_size`; else returns `400`. | Strict; supports unique per-choice IDs. | Breaking change for invalid input callers. |  
| **C**    | Remove `rid` from public schema until server refactor completes. | Eliminates ambiguity; buys redesign time. | May break existing clients using `rid`. |  
| **D**    | Adopt vLLM’s `extra_body` pattern: move non-OpenAI params (e.g., `rid`) into a dedicated sub-dict. | Future-proof; avoids OpenAI spec collisions. | Requires client changes and deeper refactoring. |  

---

### Reproduction

Running `test_embedding_openai_server.py` with SGLang v0.4.7.post1 resulted in the following error before:

![Image](https://github.com/user-attachments/assets/4d0ad037-490a-4dd5-a89a-ad23d68381f8)

### Environment

Under lmsysorg/sglang:v0.4.7.post1-cu124

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Inconsistent rid handling in OpenAI-Compatible Server #7374

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	Solution	Pros	Cons
A	Adapter broadcasts scalar `rid` to a list of identical values when `batch_size > 1`.	Easy fix; preserves external contract.	Loses per-choice ID granularity.
B	Adapter accepts both `str` and `List[str]`; validates length matches `batch_size`; else returns `400`.	Strict; supports unique per-choice IDs.	Breaking change for invalid input callers.
C	Remove `rid` from public schema until server refactor completes.	Eliminates ambiguity; buys redesign time.	May break existing clients using `rid`.
D	Adopt vLLM’s `extra_body` pattern: move non-OpenAI params (e.g., `rid`) into a dedicated sub-dict.	Future-proof; avoids OpenAI spec collisions.	Requires client changes and deeper refactoring.

[Bug] Inconsistent rid handling in OpenAI-Compatible Server #7374

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions