Skip to content
View maawad's full-sized avatar
:shipit:
:shipit:

Highlights

  • Pro

Organizations

@owensgroup @gunrock

Block or report maawad

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
maawad/README.md

πŸ‘‹ Hi there β€” I’m Muhammad Awad

I'm a Senior Member of Technical Staff at AMD Research, working on various problems including heterogeneous computing, AI for performance engineering, and libraries for distributed and heterogeneous systems.

Before joining AMD, I completed my Ph.D. in Electrical and Computer Engineering at UC Davis, advised by John Owens. My research there focused on building dynamic concurrent data structures on GPUs, such as B-trees, dynamic graphs, multiversioned trees, and high-performance hash tables.


πŸ’Ό Current Focus @ AMD

At AMD, I lead and contribute to projects spanning:

  • AI-powered tools for GPU performance and productivity
  • Libraries and runtimes for heterogeneous and distributed systems
  • Programming models for AMD GPUs and Ryzenβ„’ AI NPUs

Notable projects:

  • πŸ” Iris: Triton-based multi-GPU programming framework with SHMEM-like RMA APIs β€” combining programmability and performance in just a few hundred lines of Python and Triton.
  • 🧠 IntelliPerf: LLM-powered GPU performance engineering pipeline that profiles, diagnoses, rewrites, and validates GPU kernels automatically.
  • βš™οΈ IRON: Contributor to IRON, a low-level development stack for AMD Ryzenβ„’ AI NPUs. IRON exposes Python APIs, MLIR compiler passes, and abstractions that enhance the programmability and expressivity of NPUs for developers.

πŸ”¬ Academic Research

I’m broadly interested in parallel computing, concurrent data structures, performance analysis, and low-level GPU programming. As a Ph.D. student, I designed and built several GPU-native data structures:


πŸ“« Connect


Pinned Loading

  1. ROCm/iris ROCm/iris Public

    AMD RAD's experimental RMA library for Triton.

    Python 17 7

  2. owensgroup/BGHT owensgroup/BGHT Public

    BGHT: High-performance static GPU hash tables.

    C++ 71 8

  3. owensgroup/MVGpuBTree owensgroup/MVGpuBTree Public

    GPU B-Tree with support for versioning (snapshots).

    C++ 49 4

  4. gunrock/gunrock gunrock/gunrock Public

    Programmable CUDA/C++ GPU Graph Analytics

    C++ 1k 212

  5. owensgroup/GpuBTree owensgroup/GpuBTree Public

    Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019

    Cuda 57 13

  6. PTX_BCHT PTX_BCHT Public

    Bucketed Cuckoo hash set written in PTX and JIT-compiled.

    C++ 1 1