[Bug] Prefill out of memory when page size is large

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

```
[2025-06-18 18:11:03] Prefill batch. #new-seq: 14, #new-token: 8192, #cached-token: 0, token usage: 0.92, #unbootstrapped-req: 0, #queue-req: 105, #transferring-req: 990, input throughput (token/s): 96875.08 
[2025-06-18 18:11:03] Prefill batch. #new-seq: 18, #new-token: 8192, #cached-token: 0, token usage: 0.94, #unbootstrapped-req: 0, #queue-req: 88, #transferring-req: 1004, input throughput (token/s): 98205.90 
[2025-06-18 18:11:03] Prefill batch. #new-seq: 13, #new-token: 8192, #cached-token: 0, token usage: 0.95, #unbootstrapped-req: 0, #queue-req: 76, #transferring-req: 1015, input throughput (token/s): 85849.56 
[2025-06-18 18:11:03] Prefill batch. #new-seq: 18, #new-token: 8192, #cached-token: 0, token usage: 0.96, #unbootstrapped-req: 0, #queue-req: 59, #transferring-req: 1029, input throughput (token/s): 105483.90 
[2025-06-18 18:11:04] Prefill batch. #new-seq: 18, #new-token: 8192, #cached-token: 0, token usage: 0.98, #unbootstrapped-req: 0, #queue-req: 42, #transferring-req: 1041, input throughput (token/s): 100693.93 
[2025-06-18 18:11:04] Prefill batch. #new-seq: 13, #new-token: 5614, #cached-token: 0, token usage: 0.99, #unbootstrapped-req: 0, #queue-req: 30, #transferring-req: 1056, input throughput (token/s): 86617.45 
[2025-06-18 18:11:04] Prefill out of memory. Try to lower your batch size.
Try to allocate 5614 tokens.
Available tokens: 5952
self.token_to_kv_pool_allocator.available_size()=5952
self.tree_cache.evictable_size()=0

[2025-06-18 18:11:04] Scheduler hit an exception: Traceback (most recent call last):
  File "/root/workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2616, in run_scheduler_process
    scheduler.event_loop_overlap_disagg_prefill()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 313, in event_loop_overlap_disagg_prefill
    batch = self.get_new_batch_prefill()
  File "/root/workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1587, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/root/workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1260, in prepare_for_extend
    out_cache_loc = self.alloc_paged_token_slots_extend(
  File "/root/workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1016, in alloc_paged_token_slots_extend
    raise RuntimeError(error_msg)
RuntimeError: Prefill out of memory. Try to lower your batch size.
Try to allocate 5614 tokens.
Available tokens: 5952
self.token_to_kv_pool_allocator.available_size()=5952
self.tree_cache.evictable_size()=0
```

### Reproduction

With 1P1D, llama-3.2-3B model, and send a batch of requests that is beyond the server's capacity. Also, manually slow down the KV cache transfer to simulate a large workload, causing many KV caches to wait for transfer.

### Environment

-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Prefill out of memory when page size is large #7353

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Prefill out of memory when page size is large #7353

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions