Skip to content

Conversation

zhyncs
Copy link
Member

@zhyncs zhyncs commented Sep 2, 2024

Motivation

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs
Copy link
Member Author

zhyncs commented Sep 2, 2024

# H100 TP 2, latest v0.2.15
python3 -m sglang.launch_server --model neuralmagic/Qwen2-72B-Instruct-FP8 --quantization fp8  --trust-remote-code --tp 2 --kv-cache-dtype fp8_e5m2
python3 -m sglang.bench_serving --backend sglang
Traceback (most recent call last):
[14:47:01 TP0] Exception in ModelTpServer:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 244, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 260, in forward_step
    self.forward_prefill_batch(new_batch)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 507, in forward_prefill_batch
    sample_output, logits_output = self.model_runner.forward(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 584, in forward
    return self.forward_extend(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 542, in forward_extend
    input_metadata = InputMetadata.from_schedule_batch(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 215, in from_schedule_batch
    ret.init_flashinfer_handlers(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 245, in init_flashinfer_handlers
    update_flashinfer_indices(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 374, in update_flashinfer_indices
    model_runner.flashinfer_prefill_wrapper_paged.begin_forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 832, in plan
    self._wrapper.plan(
RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 599785472 and alignment 16 in AlignedAllocator

  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 896, in run_tp_server
    model_server.exposed_step(recv_reqs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 244, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 260, in forward_step
    self.forward_prefill_batch(new_batch)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 507, in forward_prefill_batch
    sample_output, logits_output = self.model_runner.forward(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 584, in forward
    return self.forward_extend(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 542, in forward_extend
    input_metadata = InputMetadata.from_schedule_batch(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 215, in from_schedule_batch
    ret.init_flashinfer_handlers(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 245, in init_flashinfer_handlers
    update_flashinfer_indices(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 374, in update_flashinfer_indices
    model_runner.flashinfer_prefill_wrapper_paged.begin_forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 832, in plan
    self._wrapper.plan(
RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 599785472 and alignment 16 in AlignedAllocator

It works well without --kv-cache-dtype fp8_e5m2. @ispobock @yzh119 may help take a look

@zhyncs zhyncs removed the wip label Sep 2, 2024
@zhyncs
Copy link
Member Author

zhyncs commented Sep 2, 2024

fix #1272

@zhyncs
Copy link
Member Author

zhyncs commented Sep 2, 2024

@zhyncs zhyncs self-assigned this Sep 2, 2024
@zhyncs zhyncs enabled auto-merge (squash) September 2, 2024 14:57
@zhyncs zhyncs disabled auto-merge September 2, 2024 15:18
@zhyncs zhyncs merged commit 2561ed0 into main Sep 2, 2024
10 checks passed
@zhyncs zhyncs deleted the night branch September 2, 2024 15:18
@zhyncs zhyncs mentioned this pull request Sep 2, 2024
3 tasks
@ispobock ispobock mentioned this pull request Sep 7, 2024
5 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant