-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
Description
Summary
This RFC proposes removing the existing /v1/batches
and /v1/files
endpoints from the main OpenAI-compatible server and replacing them with a standalone offline batch processing service.
Note: As part of the ongoing OpenAI API refactor, the batch support has already been removed from the main server. This RFC serves to document the rationale and formalize the replacement plan.
Problem
7.1 Fundamental Issues with the Current Batch API (#7068 )
The current design for online batch processing is flawed and not production-safe. Key issues include:
- Server Stability Risk: Uploading and processing thousands of requests at once can overwhelm online API servers.
- Timing Constraints: Difficult to enforce
completion_window
in a real-time environment. - Resource Contention: Batch jobs run alongside latency-sensitive requests without proper isolation.
- Architecture Mismatch: Batch workloads are inherently asynchronous/offline, conflicting with the synchronous nature of standard OpenAI endpoints.
Proposed Solution
1. Simplify Online Endpoints
- Remove logic for handling list-wrapped input in
/v1/chat/completions
,/v1/embeddings
, etc. - Accept only single request per HTTP call (OpenAI spec-compliant).
- Cleaner code and better performance for common-case usage.
2. Split Out Batch Service
Implement batch processing as a separate offline job runner, modeled after how vLLM does it.
This batch runner will:
- Accept batch jobs in OpenAI-compatible
.jsonl
format - Spawn a new process/container to handle the job
- Stream output to a results file (local or presigned S3 URLs)
- Optionally enforce
completion_window
guarantees in the background
3. Remove from Main Server
- Remove
/v1/batches
and/v1/files
routes from the main OpenAI-compatible HTTP server. - These should live in a separate service (
batch-runner
) to enforce separation of concerns.
📌 Action Items
- Finalize and approve this RFC
- Implement batch runner
- Deprecate online batch endpoints
- Update docs and integration tests
Swipe4057, lifuhuang, b8zhong, junliu-mde and taegeonumSwipe4057Swipe4057yeticle