I'm a Senior Member of Technical Staff at AMD Research, working on various problems including heterogeneous computing, AI for performance engineering, and libraries for distributed and heterogeneous systems.
Before joining AMD, I completed my Ph.D. in Electrical and Computer Engineering at UC Davis, advised by John Owens. My research there focused on building dynamic concurrent data structures on GPUs, such as B-trees, dynamic graphs, multiversioned trees, and high-performance hash tables.
At AMD, I lead and contribute to projects spanning:
- AI-powered tools for GPU performance and productivity
- Libraries and runtimes for heterogeneous and distributed systems
- Programming models for AMD GPUs and Ryzenβ’ AI NPUs
Notable projects:
- π Iris: Triton-based multi-GPU programming framework with SHMEM-like RMA APIs β combining programmability and performance in just a few hundred lines of Python and Triton.
- π§ IntelliPerf: LLM-powered GPU performance engineering pipeline that profiles, diagnoses, rewrites, and validates GPU kernels automatically.
- βοΈ IRON: Contributor to IRON, a low-level development stack for AMD Ryzenβ’ AI NPUs. IRON exposes Python APIs, MLIR compiler passes, and abstractions that enhance the programmability and expressivity of NPUs for developers.
Iβm broadly interested in parallel computing, concurrent data structures, performance analysis, and low-level GPU programming. As a Ph.D. student, I designed and built several GPU-native data structures:
- π³ Dynamic GPU B-Tree
- β Multiversion GPU B-Tree
- πΈ Dynamic graph data structure (integrated into Gunrock)
- π§Ή Epoch-based memory reclamation for safe memory reuse on GPUs
- #οΈβ£ Static GPU hash tables β including cuckoo, Po2, and iceberg hashing
- π¨βπ» PTX-level bucketed cuckoo hash set β written in PTX and JIT-compiled
- βοΈ Email: muhaawad at amd dot com
- π Website
- π CV
- π§βπΌ LinkedIn
- π Google Scholar