Skip to content

Conversation

yiakwy-xpu-ml-framework-team
Copy link
Contributor

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Feb 28, 2025

Motivation

Suggested in PR#3702, AMD uses two FP8 formats :

  • OCP fp8 format (fp8_e4m3), and
  • FP8 FNUZ format which is jointly, originally proposed by Graphcore and officialy supported in meta Pytorch with reference to Graphcore uint8 representation format:

https://github.com/pytorch/pytorch/blob/main/c10/util/Float8_e4m3fnuz.h

Mismatch of behavior of FP8 in Meta implementation and AMD implementation

In tradition FP8 MAX in torch.float8_e4m3fnuz format is represented with

fp8_max = 0b01111 111 (0b01000 000 is used as NaN) = 2^{15 - 8} * (1+0.875) = 240

Currently in AMD, 224 is chosen as FP8 MAX :

fp8_max = 0b01111 110 (0b01111 111 is used as NaN in OCP FP8 format, see OCP 2023-12-1 specification) = 2^{15 - 8} * (1+0.75) = 224

But both values can be represented in AMD device functions. Here is the device function test with ROCm SDK 6.3:

截屏2025-03-08 12 22 17

From the above test, both 224, 240 are valid numer in AMD FP8 E4M3 FNUZ format. So what's the problem ?

The problem is round trip. Here is the torch test with AMD device :

截屏2025-03-08 12 44 41

Any value great than FP8 MAX will be represented as NaN in pytorch, not clamped to a proper max value.

However, in AMD device implementation, it is clamp to 240.

In current vllm application, AMD FP8 MAX is hard coded as 224 for better precision, though 240 is also valid in ROCM 6.3 SDK

Besides I also found, min sub norm value (1e-3 ~ 0.000976562) is not provided by default in torch. So I added them, so it can be useful extend the represetnation of digits below minimum normals, hence can be used to adjust weights.

Group Quant efficiency

MXFP8 was introduced in OCP project and supported in NVIDIA Blackwell, AMD Quantization toolkit Quark, and projected to be availabe in pytorch in 2025 Q1. This enables native group quant with 32 consecutive elements instead of per tensor quant.

The immediate benefits bought by MXFP8 is that we can use FP8 E8M0 to store scaling number instead of FP32 .

截屏2025-03-08 14 53 49

Currently SGLang implemented group quant with group size 128, then we can adjust the implementation to support 32-group quant in Blackwell and other chips support OCP MXFP8.

Modifications

Add global constant. Usage example :

Python 3.12.8 (main, Dec  4 2024, 08:54:12) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sglang.srt.utils import is_hip
>>> FP8_E4M3_MAX
224.0
>>> FP8_E4M3_MIN
-224.0

Note : FP8_E4M3_MIN_SUB_NORM , OCP_MXFP8_E8M0_GROUP_QUANT_GRANULARITY will be added in the relevant PRs.

Checklist

@yiakwy-xpu-ml-framework-team
Copy link
Contributor Author

@HaiShaw

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch from ebd517e to 5976635 Compare March 6, 2025 09:16
@BBuf BBuf mentioned this pull request Mar 7, 2025
18 tasks
@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch 2 times, most recently from d8ea19a to 8e218c8 Compare March 8, 2025 16:27
@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch from 7a46477 to eec5980 Compare March 9, 2025 12:06
@BBuf BBuf changed the title [ROCm] add amd ocp fp8 E3M4FUZ constant [tools] add fp8 max/min constant in utils Mar 10, 2025
@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch from eec5980 to 2915363 Compare March 12, 2025 03:07
@merrymercy merrymercy merged commit 18c2713 into sgl-project:main Mar 13, 2025
20 of 21 checks passed
hebiao064 pushed a commit to hebiao064/sglang that referenced this pull request Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants