-
Notifications
You must be signed in to change notification settings - Fork 2.8k
ROCM: AITER BLOCK GEMM #4075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCM: AITER BLOCK GEMM #4075
Conversation
cc @HaiShaw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use CK_MOE
to trigger it, no need AITER_BLOCK_GEMM
to avoid deep tree of choices and many ENVs.
OK, modified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Motivation
Decode Performance boost (DSR1 +8%):
Before:
After:
Modifications
AITER BLOCK GEMM (tuned for DSR1/V3 for tp 8):
AITER_BLOCK_GEMM=1
in command line or ENV to activate aiter/ck block gemm.Checklist