[Bug] OpenAI-compatible batch inference fails for singleton batches

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

If the batch only contains a single request, then a `TypeError` is raised. One possible workaround is to duplicate the single request, run the job, and then remove the duplicate response.

### Reproduction

```python
import json
import time
from openai import OpenAI

port=30000
print_highlight=print
client = OpenAI(base_url=f"http://127.0.0.1:{port}/v1", api_key="None")

requests = [
    {
        "custom_id": "request-1",
        "method": "POST",
        "url": "/chat/completions",
        "body": {
            "model": "qwen/qwen2.5-0.5b-instruct",
            "messages": [
                {"role": "user", "content": "Tell me a joke about programming"}
            ],
            "max_tokens": 50,
        },
    },
#    {
#        "custom_id": "request-2",
#        "method": "POST",
#        "url": "/chat/completions",
#        "body": {
#            "model": "qwen/qwen2.5-0.5b-instruct",
#            "messages": [{"role": "user", "content": "What is Python?"}],
#            "max_tokens": 50,
#        },
#    },
]

input_file_path = "batch_requests.jsonl"

with open(input_file_path, "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

with open(input_file_path, "rb") as f:
    file_response = client.files.create(file=f, purpose="batch")

batch_response = client.batches.create(
    input_file_id=file_response.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

print_highlight(f"Batch job created with ID: {batch_response.id}")

while batch_response.status not in ["completed", "failed", "cancelled"]:
    time.sleep(3)
    print(f"Batch job status: {batch_response.status}...trying again in 3 seconds...")
    batch_response = client.batches.retrieve(batch_response.id)

if batch_response.status == "completed":
    print("Batch job completed successfully!")
    print(f"Request counts: {batch_response.request_counts}")

    result_file_id = batch_response.output_file_id
    file_response = client.files.content(result_file_id)
    result_content = file_response.read().decode("utf-8")

    results = [
        json.loads(line) for line in result_content.split("\n") if line.strip() != ""
    ]

    for result in results:
        print_highlight(f"Request {result['custom_id']}:")
        print_highlight(f"Response: {result['response']}")

    print_highlight("Cleaning up files...")
    # Only delete the result file ID since file_response is just content
    client.files.delete(result_file_id)
else:
    print_highlight(f"Batch job failed with status: {batch_response.status}")
    if hasattr(batch_response, "errors"):
        print_highlight(f"Errors: {batch_response.errors}")
```

### Environment

```
Python: 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA L40S
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.9
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.1, V12.1.105
CUDA Driver Version: 535.247.01
PyTorch: 2.6.0+cu124
sglang: 0.4.6.post3
sgl_kernel: 0.1.1
flashinfer_python: 0.2.5
triton: 3.2.0
transformers: 4.51.1
torchao: 0.11.0
numpy: 1.26.4
aiohttp: 3.11.18
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.31.1
interegular: 0.3.3
modelscope: 1.25.0
orjson: 3.10.18
outlines: 0.1.11
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.4
python-multipart: 0.0.20
pyzmq: 26.4.0
uvicorn: 0.34.2
uvloop: 0.21.0
vllm: 0.8.4
xgrammar: 0.1.19
openai: 1.75.0
tiktoken: 0.9.0
anthropic: 0.51.0
litellm: 1.69.0
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NODE    NODE    NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU1    NODE     X      NODE    NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU2    NODE    NODE     X      NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU3    NODE    NODE    NODE     X      SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU4    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE    48-95,144-191   1               N/A
GPU5    SYS     SYS     SYS     SYS     NODE     X      NODE    NODE    48-95,144-191   1               N/A
GPU6    SYS     SYS     SYS     SYS     NODE    NODE     X      NODE    48-95,144-191   1               N/A
GPU7    SYS     SYS     SYS     SYS     NODE    NODE    NODE     X      48-95,144-191   1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Hypervisor vendor: KVM
ulimit soft: 65536
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] OpenAI-compatible batch inference fails for singleton batches #6200

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] OpenAI-compatible batch inference fails for singleton batches #6200

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions