Skip to content

Conversation

Ying1123
Copy link
Member

@Ying1123 Ying1123 commented Mar 13, 2025

  • Use faster kernel for temp=0
  • Support cuda graph padding
  • Simplify redundant python code

llama 2 7b: 390 token/s -> 400 token/s with this PR

Co-authored-by: Sehoon Kim <sehoon@x.ai>

@merrymercy merrymercy merged commit 1b85929 into main Mar 16, 2025
4 of 22 checks passed
@merrymercy merrymercy deleted the ying-eagle branch March 16, 2025 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants