Move FP8 to sglang #2366

HaiShaw · 2024-12-05T19:45:08Z

Motivation

Move FP8 layers definition to SGLang

Modifications

As it is.
Kernels come next.

Checklist

[+] Format your code according to the Contributor Guide.
[+] Add unit tests as outlined in the Contributor Guide.
[+] Update documentation as needed, including docstrings or example tutorials.

Co-authored-by: HAI <hixiao@gmail.com>

zhyncs

Except for vllm.model_executor.layers.quantization, LinearBase, and _custom_ops, everything else needs to be removed. Thanks!

python/sglang/srt/layers/quantization/fp8.py

zhyncs · 2024-12-05T20:09:51Z

python/sglang/srt/layers/quantization/fp8.py

+    per_tensor_dequantize,
+    requantize_with_max_scale,
+)
+from vllm.model_executor.parameter import ModelWeightParameter, PerTensorScaleParameter


remove this

This is still in use, will decouple and migrate later.

python/sglang/srt/layers/quantization/fp8.py

…-project#2359)

zhyncs · 2024-12-06T07:16:29Z

move to #2370
All credit goes to @HaiShaw Thanks!

xiaobochen123 and others added 2 commits December 5, 2024 10:44

MoE Expert Parallel Impl (sgl-project#2203)

f9b7c64

Co-authored-by: HAI <hixiao@gmail.com>

Move FP8 to sglang

c229398

HaiShaw requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners December 5, 2024 19:45

zhyncs reviewed Dec 5, 2024

View reviewed changes

merrymercy and others added 5 commits December 5, 2024 13:42

Fix the cuda graph capture range for small #max-running-requests (sgl…

337fe53

…-project#2359)

remove unneccessarty vllm dependencies

47b1e33

Merge branch 'main' into moe_fp8

1049088

[router] use 2-gpu-runner (sgl-project#2368)

fc6387e

Merge branch 'main' into moe_fp8

9677f61

HaiShaw requested a review from zhyncs December 6, 2024 04:04

zhyncs force-pushed the main branch from fc6387e to 64fceab Compare December 6, 2024 06:14

zhyncs requested review from ByronHsu and hnyls2002 as code owners December 6, 2024 06:14

Merge branch 'main' into moe_fp8

1a98996

HaiShaw closed this Dec 6, 2024

zhyncs mentioned this pull request Dec 7, 2024

fix: resolve fp8 moe issue #2387

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move FP8 to sglang #2366

Move FP8 to sglang #2366

Uh oh!

HaiShaw commented Dec 5, 2024

Uh oh!

zhyncs left a comment

Uh oh!

Uh oh!

zhyncs Dec 5, 2024

Uh oh!

HaiShaw Dec 6, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhyncs commented Dec 6, 2024

Uh oh!

Uh oh!

Move FP8 to sglang #2366

Move FP8 to sglang #2366

Uh oh!

Conversation

HaiShaw commented Dec 5, 2024

Motivation

Modifications

Checklist

Uh oh!

zhyncs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhyncs Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

HaiShaw Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhyncs commented Dec 6, 2024

Uh oh!

Uh oh!