A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Aug 10, 2025 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Simple, scalable AI model deployment on GPU clusters
Stable Diffusion web UI
A deep learning package for many-body potential energy representation and molecular dynamics
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
Set up inference profiling in 60 seconds
Horizon chart for CPU/GPU/Neural Engine utilization monitoring. Supports Apple M1-M4, Nvidia GPUs, AMD GPUs
ROCm Install Utilities: rocminstall.py script to install a specific ROCm release version/revision.
AMD RAD's experimental RMA library for Triton.
Add a description, image, and links to the rocm topic page so that developers can more easily learn about it.
To associate your repository with the rocm topic, visit your repo's landing page and select "manage topics."