Skip to content

🚧 RFC: Redesign Batch Processing as an Offline Workflow #7427

@CatherineSue

Description

@CatherineSue

Summary

This RFC proposes removing the existing /v1/batches and /v1/files endpoints from the main OpenAI-compatible server and replacing them with a standalone offline batch processing service.

Note: As part of the ongoing OpenAI API refactor, the batch support has already been removed from the main server. This RFC serves to document the rationale and formalize the replacement plan.


Problem

7.1 Fundamental Issues with the Current Batch API (#7068 )

The current design for online batch processing is flawed and not production-safe. Key issues include:

  • Server Stability Risk: Uploading and processing thousands of requests at once can overwhelm online API servers.
  • Timing Constraints: Difficult to enforce completion_window in a real-time environment.
  • Resource Contention: Batch jobs run alongside latency-sensitive requests without proper isolation.
  • Architecture Mismatch: Batch workloads are inherently asynchronous/offline, conflicting with the synchronous nature of standard OpenAI endpoints.

Proposed Solution

1. Simplify Online Endpoints

  • Remove logic for handling list-wrapped input in /v1/chat/completions, /v1/embeddings, etc.
  • Accept only single request per HTTP call (OpenAI spec-compliant).
  • Cleaner code and better performance for common-case usage.

2. Split Out Batch Service

Implement batch processing as a separate offline job runner, modeled after how vLLM does it.

This batch runner will:

  • Accept batch jobs in OpenAI-compatible .jsonl format
  • Spawn a new process/container to handle the job
  • Stream output to a results file (local or presigned S3 URLs)
  • Optionally enforce completion_window guarantees in the background

3. Remove from Main Server

  • Remove /v1/batches and /v1/files routes from the main OpenAI-compatible HTTP server.
  • These should live in a separate service (batch-runner) to enforce separation of concerns.

📌 Action Items

  • Finalize and approve this RFC
  • Implement batch runner
  • Deprecate online batch endpoints
  • Update docs and integration tests

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions