-
Notifications
You must be signed in to change notification settings - Fork 10
Initial AMD MI300X Support via. AITER #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your PR! I have left some comments. LMK if they are unclear or don't make sense.
@sayakpaul thanks a lot for the review, and thanks to you and @jbschlosser for the hard work making this repo ^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work. I will let @jbschlosser have his say here, too.
Meanwhile, feel free to update this part of the README (that we now support AMD integration, thanks to your PR).
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the contribution! I may have missed this, but how do the speedups look on the AMD hardware?
There was definitely some speedups when going from baseline to torch.compile+fp8. I saw about a 66-70% speedup. torch.export gives even more speedup, but it's not working correctly yet. Will have to debug that |
Adds AITER support when running on AMD's MI300X to enable fp8 inference.
Command:
python gen_image.py --prompt "A cat playing with a ball of yarn" --output-file output.png --compile_export_mode compile
NOTE:
torch.export
isn't working correctly. It needs to be debugged before it would run properly. For now, please use--compile_export_mode compile
as shown in the above command.Baseline (taken from README.md):

NVIDIA fully-optimized w/ quantization (taken from README.md):

AMD MI300X w/ torch.compile w/ quantization:
