Copy config files for MI300X to support in virtualized environments #3505

yosoyjay · 2025-02-12T00:29:35Z

Motivation

Add support for MI300X in virtual environments in ROCm container images. See discussion in #3219.

Modifications

Add RUN command in Dockerfile.rocm to find all config files for MI300X and copy them to the same directory only changing the string in the file name from "MI300X" to "MI300X_VF" to support devices in virtualized environments.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
[-] Add unit tests as outlined in the Running Unit Tests.
[-] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and [Accuracy Results(https://docs.sglang.ai/references/accuracy_evaluation.html).
[-] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

Benchmark on 8 virtualized MI300Xs

Invoked with python -m sglang.bench_one_batch --model deepseek-ai/DeepSeek-R1 --tp 8 --batch 32 --input-len 256 --output-len 32 --trust-remote-code in container built from Dockerfile.rocm.

Before:

Benchmark ...
Prefill. latency: 0.81376 s, throughput:  10066.82 token/s
Decode.  latency: 0.10565 s, throughput:    302.89 token/s
Decode.  latency: 0.10617 s, throughput:    301.40 token/s
Decode.  latency: 0.10633 s, throughput:    300.96 token/s
Decode.  latency: 0.10639 s, throughput:    300.77 token/s
Decode.  latency: 0.10635 s, throughput:    300.88 token/s
Decode.  median latency: 0.10636 s, median throughput:    300.88 token/s
Total. latency:  4.110 s, throughput:   2242.37 token/s

After:

Benchmark ...
Prefill. latency: 0.82810 s, throughput:   9892.55 token/s
Decode.  latency: 0.04025 s, throughput:    794.98 token/s
Decode.  latency: 0.04025 s, throughput:    794.95 token/s
Decode.  latency: 0.04087 s, throughput:    783.05 token/s
Decode.  latency: 0.04208 s, throughput:    760.47 token/s
Decode.  latency: 0.04239 s, throughput:    754.95 token/s
Decode.  median latency: 0.04493 s, median throughput:    712.22 token/s
Total. latency:  2.201 s, throughput:   4186.69 token/s

tot0 · 2025-02-14T17:10:16Z

docker/Dockerfile.rocm

+# Copy config files to support MI300X in virtualized environments (MI300X_VF).  Symlinks will not be created in image build.
+RUN find /sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/ \
+         /sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/ \
+         -type f -name '*MI300X*' | xargs -I {} sh -c 'vf_config=$(echo "$1" | sed "s/MI300X/MI300X_VF/"); cp "$1" "$vf_config"' -- {}


Tested this out on 8*MI300X machine in Azure, resulted in the configs actually being picked up and jumped single sequence gen throughput from 17tok/s to 28tok/s!

tot0 · 2025-02-14T17:17:04Z

@HaiShaw @zhyncs This fix is very important for users of MI300X on platforms virtualizing the GPUs.

Copy config files for MI300X to support in virtualized environments

1e5e7bf

tot0 approved these changes Feb 14, 2025

View reviewed changes

zhyncs approved these changes Feb 14, 2025

View reviewed changes

Merge branch 'main' into feature/add-config-files-mi300x_vf-in-container

093352b

zhyncs merged commit 6ce6eab into sgl-project:main Feb 14, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Copy config files for MI300X to support in virtualized environments #3505

Copy config files for MI300X to support in virtualized environments #3505

Uh oh!

yosoyjay commented Feb 12, 2025 •

edited

Loading

Uh oh!

tot0 Feb 14, 2025

Uh oh!

tot0 commented Feb 14, 2025

Uh oh!

Uh oh!

Uh oh!

Copy config files for MI300X to support in virtualized environments #3505

Copy config files for MI300X to support in virtualized environments #3505

Uh oh!

Conversation

yosoyjay commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Benchmark on 8 virtualized MI300Xs

Uh oh!

tot0 Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

tot0 commented Feb 14, 2025

Uh oh!

Uh oh!

Uh oh!

yosoyjay commented Feb 12, 2025 •

edited

Loading