Skip to content

[Bug] Inconsistent rid handling in OpenAI-Compatible Server #7374

@jhinpan

Description

@jhinpan

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

1. Overview

Through refactoring OpenAI-Compatible Server with @CatherineSue. We found that with SGLang v0.4.7.post1, requests including a custom rid field fail under two specific conditions:

  • When parameter n > 1 in the /v1/chat/completions endpoint.

  • When the input to /v1/embeddings is a list of strings.

Current issue oriented from inconsistencies between the server's adapter and the low-level TokenizerManager. Specifically, the adapter currently forwards rid as a single string, whereas the internal batching logic expects rid as a list of strings matching the batch size.

We temporarily addressed this issue by disabling rid handling in both serving_base and client payloads. However, a comprehensive fix is required, involving adjustments in tokenizer_manager.py and io_struct.py. That's why we open this issue track for future fix.

Additionally, within openai_api/protocol.py, only a single string is accepted for rid. This design choice originally aimed to serve enterprise clients who use internal correlation IDs (e.g., X-Request-ID) for tracking requests via logs and metrics. However, during batch processing, TokenizerManager inherently expects a List[str] for rid, leading to runtime crashes when a scalar is mistakenly treated as a list.


2. Expected Behavior

  • rid should function purely as an opaque correlation ID, analogous to headers such as X-OpenAI-Request-ID from the official API, without affecting batch logic.

  • Both /v1/chat/completions and /v1/embeddings endpoints should function correctly regardless of the value of n or the length of the input.


3. Temporary Workaround

Disable rid in client payloads. Internal testing confirms that removing this field restores normal endpoint functionality.


4. Protential Fixes

Approach Solution Pros Cons
A Adapter broadcasts scalar rid to a list of identical values when batch_size > 1. Easy fix; preserves external contract. Loses per-choice ID granularity.
B Adapter accepts both str and List[str]; validates length matches batch_size; else returns 400. Strict; supports unique per-choice IDs. Breaking change for invalid input callers.
C Remove rid from public schema until server refactor completes. Eliminates ambiguity; buys redesign time. May break existing clients using rid.
D Adopt vLLM’s extra_body pattern: move non-OpenAI params (e.g., rid) into a dedicated sub-dict. Future-proof; avoids OpenAI spec collisions. Requires client changes and deeper refactoring.

Reproduction

Running test_embedding_openai_server.py with SGLang v0.4.7.post1 resulted in the following error before:

Image

Environment

Under lmsysorg/sglang:v0.4.7.post1-cu124

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions