Skip to content

Conversation

HandH1998
Copy link
Collaborator

@HandH1998 HandH1998 commented Dec 29, 2024

Update Triton configs for block fp8 kernels

@HandH1998 HandH1998 changed the title Update Trion configs for block fp8 kernels Update Triton configs for block fp8 kernels Dec 29, 2024
@zhyncs zhyncs merged commit afa0341 into main Dec 29, 2024
17 checks passed
@zhyncs zhyncs deleted the tune_kernel branch December 29, 2024 14:53
@@ -418,8 +418,7 @@ def _distribute(method: str, inputs: List[Any]) -> List[Any]:
search_space = [
config
for config in search_space
if block_n % config["BLOCK_SIZE_N"] == 0
Copy link
Collaborator

@BBuf BBuf Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems. that the change would reduce the search space in normal cases, which might have a slight impact on fused_moe_triton performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BBuf Do you mean the removed line if block_n % config["BLOCK_SIZE_N"] == 0? I think it will make the search space larger after removing the line, as fewer limitations are required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if block_k % config["BLOCK_SIZE_K"] == 0 is required by the block w8a8 fp8 gemm. In the main loop of the gemm, this limitation can ensure that only a quantization scale is needed.

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants