-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Speed up when having padding tokens in DeepEP #6175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pr will significantly reduce DeepSeek's inference performance (15%+). Need to look at the specific reasons. |
@lambert0312 Looks bad. Could you please show your commands, and would be great to have a profile. My first guess is that, we need to fuse it. |
@fzyzcjy I tried to modify it. You can see the PR I linked above. Thank you. |
Motivation
test
Modifications
Checklist