[OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo and LogProbs Processor

**Points:** 2-3 days

**Description:** Unify Logprobs handling and UsageInfo in `v1/chat/completions` and `v1/completions`. Reduce the repetitive code, increase code reusability and structure

**Deliverables:**

- [x] Complete Task 1 and Task 2
- [x] Add UTs

### Task 1: Token Logprobs Handling

Current logic in `adapter.py#L1327-L1368`: <https://github.com/sgl-project/sglang/blob/ca9291181df4b6a27c371eb71bf740263e5eb3ef/python/sglang/srt/openai_api/adapter.py#L1327-L1368>

New logic in `serving_chat.py`: https://github.com/sgl-project/sglang/blob/70c471a868bf505fadbfe0a041e7637a91db0365/python/sglang/srt/entrypoints/openai/serving_chat.py#L786-L794

Non streaming logprobs first calls `_process_response_logprobs`, then calls `_process_logprobs_tokens`.

#### Unify Logprobs

- Logic is fine, but it has quite some convoluated with streaming logprobs, and completions endpoint.
Inconsistent Entry Points:
- Chat has 2 different methods (`_process_response_logprobs` vs `_process_streaming_logprobs` ) for similar work
- Duplicated Logic: Both chat and completions call `to_openai_style_logprobs`
- Mixed Responsibilities: Some methods do conversion + processing, others just processing
- Hard to Test: Complex call chains make unit testing difficult

#### Design

- **Approach**: Create a unified `LogProbsProcessor` using factory pattern to eliminate code duplication and inconsistent APIs.

- **New File**: `sglang/python/sglang/srt/entrypoints/openai/logprobs_processor.py`

- **High Level Design**:

  - `serving_chat.py`: Replace `_process_streaming_logprobs` and `_process_response_logprobs` with factory calls, remove `_process_logprobs_tokens`
  - `serving_completions.py`: Replace inline `to_openai_style_logprobs` calls with factory methods
  - `utils.py`: Deprecate or remove `to_openai_style_logprobs` function

### Task 2: UsageInfo

#### Current Problem

- **Code Duplication**: `aggregate_token_usage` (utils.py) vs `_calculate_streaming_usage_base` (serving_base.py)
- **Different Data Formats**: Non-streaming uses response lists, streaming uses token dictionaries
- **Similar Logic**: Both calculate total tokens with n_choices handling and cache reporting

#### Design Recommendation

- **Approach**: Create unified `UsageProcessor` following same factory pattern as LogProbs.

- **New File**: `sglang/python/sglang/srt/entrypoints/openai/usage_processor.py`

- **Files to Update**:

  - `serving_chat.py`: Replace `aggregate_token_usage` calls with factory methods
  - `serving_completions.py`: Replace `aggregate_token_usage` calls with factory methods  
  - `serving_base.py`: Replace `_calculate_streaming_usage_base` with factory calls
  - `utils.py`: Deprecate `aggregate_token_usage` function

- **Functions to Consolidate**:

  - `aggregate_token_usage` (from utils.py) → `UsageProcessor.calculate_response_usage`
  - `_calculate_streaming_usage_base` (from serving_base.py) → `UsageProcessor.calculate_streaming_usage`



	logprobs = False
	if isinstance(request, list) and request[idx].logprobs:
	logprobs = True
	elif (not isinstance(request, list)) and request.logprobs:
	logprobs = True
	if logprobs:
	logprobs = to_openai_style_logprobs(
	output_token_logprobs=ret_item["meta_info"]["output_token_logprobs"],
	output_top_logprobs=ret_item["meta_info"].get(
	"output_top_logprobs", None
	),
	)
	token_logprobs = []
	for token_idx, (token, logprob) in enumerate(
	zip(logprobs.tokens, logprobs.token_logprobs)
	):
	token_bytes = list(token.encode("utf-8"))
	top_logprobs = []
	if logprobs.top_logprobs:
	for top_token, top_logprob in logprobs.top_logprobs[
	token_idx
	].items():
	top_token_bytes = list(top_token.encode("utf-8"))
	top_logprobs.append(
	TopLogprob(
	token=top_token,
	bytes=top_token_bytes,
	logprob=top_logprob,
	)
	)
	token_logprobs.append(
	ChatCompletionTokenLogprob(
	token=token,
	bytes=token_bytes,
	logprob=logprob,
	top_logprobs=top_logprobs,
	)
	)

	choice_logprobs = ChoiceLogprobs(content=token_logprobs)
	else:
	choice_logprobs = None

	def _process_response_logprobs(self, ret_item: Dict[str, Any]) -> ChoiceLogprobs:
	"""Process logprobs for non-streaming response"""
	logprobs = to_openai_style_logprobs(
	output_token_logprobs=ret_item["meta_info"]["output_token_logprobs"],
	output_top_logprobs=ret_item["meta_info"].get("output_top_logprobs", None),
	)

	token_logprobs = self._process_logprobs_tokens(logprobs, use_token_index=True)
	return ChoiceLogprobs(content=token_logprobs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo and LogProbs Processor #7259

Task 1: Token Logprobs Handling

Unify Logprobs

Design

Task 2: UsageInfo

Current Problem

Design Recommendation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo and LogProbs Processor #7259

Description

Task 1: Token Logprobs Handling

Unify Logprobs

Design

Task 2: UsageInfo

Current Problem

Design Recommendation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions