Skip to content

Conversation

merrymercy
Copy link
Contributor

No description provided.

else:
self.decode_use_tensor_cores = False
if not _grouped_size_compiled_for_decode_kernels(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we remove this _grouped_size_compiled_for_decode_kernels I think it's useless in FlashInfer v0.2 cc @yzh119

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can use some heuristic:

  1. For fp16, use_tensor_cores=True when gqa_group_size > 4
  2. For fp8, we can always enable use_tensor_cores=True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@merrymercy merrymercy merged commit 8e1adb8 into main Nov 25, 2024
1 of 13 checks passed
@merrymercy merrymercy deleted the pr-fix-flashinfer branch November 25, 2024 04:58
@zhyncs zhyncs mentioned this pull request Nov 25, 2024
3 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants