Skip to content

Conversation

jammm
Copy link
Contributor

@jammm jammm commented Jun 30, 2025

Adds AITER support when running on AMD's MI300X to enable fp8 inference.

Command:
python gen_image.py --prompt "A cat playing with a ball of yarn" --output-file output.png --compile_export_mode compile

NOTE: torch.export isn't working correctly. It needs to be debugged before it would run properly. For now, please use --compile_export_mode compile as shown in the above command.

Baseline (taken from README.md):
image

NVIDIA fully-optimized w/ quantization (taken from README.md):
image

AMD MI300X w/ torch.compile w/ quantization:
output_amd

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your PR! I have left some comments. LMK if they are unclear or don't make sense.

@jammm
Copy link
Contributor Author

jammm commented Jul 1, 2025

@sayakpaul thanks a lot for the review, and thanks to you and @jbschlosser for the hard work making this repo ^^
I addressed your comments in 0661e85. Please let me know if you have any other questions/comments. Thanks!

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work. I will let @jbschlosser have his say here, too.

Meanwhile, feel free to update this part of the README (that we now support AMD integration, thanks to your PR).

image

@jammm
Copy link
Contributor Author

jammm commented Jul 1, 2025

Meanwhile, feel free to update this part of the README (that we now support AMD integration, thanks to your PR).

Done!

Copy link
Collaborator

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the contribution! I may have missed this, but how do the speedups look on the AMD hardware?

@sayakpaul sayakpaul merged commit 1562815 into huggingface:main Jul 2, 2025
@jammm
Copy link
Contributor Author

jammm commented Jul 2, 2025

Awesome, thanks for the contribution! I may have missed this, but how do the speedups look on the AMD hardware?

There was definitely some speedups when going from baseline to torch.compile+fp8. I saw about a 66-70% speedup. torch.export gives even more speedup, but it's not working correctly yet. Will have to debug that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants