Skip to content

Conversation

yiakwy-xpu-ml-framework-team
Copy link
Contributor

Motivation

fix fp8 dtype problem introduced in #4485

Modifications

Checklist

@yiakwy-xpu-ml-framework-team
Copy link
Contributor Author

yiakwy-xpu-ml-framework-team commented Mar 17, 2025

cc @HaiShaw

@merrymercy merrymercy merged commit 5f9b2c6 into sgl-project:main Mar 17, 2025
19 of 21 checks passed
@hebiao064
Copy link
Collaborator

Thanks a lot!

@@ -108,10 +108,15 @@ def process_weights_after_loading(self, layer: torch.nn.Module) -> None:
layer.weight, layer.weight.shape[-1]
)
weight_scale = weight_scale.t().contiguous()
if _is_hip:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does AMD also supported Cutlass? I thought it's not...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CK_tile (cutlass in AMD I personally believe) is currently recommended to program complex kernel fusion.

It contains tile window - ddr window transformation, pipeliner and partitioner. Very useful. You can check it out. D

And in the recently ASM kernels, they added many asm level optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants