attention

Star

Here are 10 public repositories matching this topic...

flashinfer-ai / flashinfer

Star

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

Updated Aug 29, 2025
Cuda

thu-ml / SageAttention

Star

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

cuda triton attention vit quantization video-generation mlsys inference-acceleration efficient-attention llm llm-infra video-generate

Updated Aug 5, 2025
Cuda

thu-ml / SpargeAttn

Star

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

attention vit quantization video-generation mlsys inference-acceleration ai-infra vision-transformer sparse-attention llm sageattention

Updated Aug 13, 2025
Cuda

xlite-dev / ffpa-attn

Star

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

cuda attention sdpa mla mlsys tensor-cores flash-attention deepseek deepseek-v3 deepseek-r1 fused-mla flash-mla

Updated Aug 8, 2025
Cuda

gmongaras / Cottention_Transformer

Star

Code for the paper "Cottention: Linear Transformers With Cosine Attention"

transformers attention linear-attention

Updated Oct 19, 2024
Cuda

ncherel / psal

Star

Patch-Based Stochastic Attention (efficient attention mecanism)

attention patch attention-mechanism differentiable

Updated Jan 16, 2023
Cuda

i404788 / DPFP-pytorch

Star

Implementation of Deterministic Parameter-Free Projection (DPFP) from the paper "Linear Transformers Are Secretly Fast Weight Memory Systems"

pytorch attention dpfp

Updated May 26, 2021
Cuda

XiaomingFun233 / flash_attn_cuda

Star

easy naive flash attention without optimization base on origin paper

decode attention cuda-kernels flashattention

Updated Jun 25, 2025
Cuda

weiyu0824 / flash-attention-lite

Star

Basic Flash attention Implmentation

cuda torch attention

Updated Aug 25, 2024
Cuda

torotoki / simple-paged-attention

Star

A simple implementation of PagedAttention purely written in CUDA and C++.

cpp cuda transformer attention llm

Updated Aug 24, 2025
Cuda

Improve this page

Add a description, image, and links to the attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention

Here are 10 public repositories matching this topic...

flashinfer-ai / flashinfer

thu-ml / SageAttention

thu-ml / SpargeAttn

xlite-dev / ffpa-attn

gmongaras / Cottention_Transformer

ncherel / psal

i404788 / DPFP-pytorch

XiaomingFun233 / flash_attn_cuda

weiyu0824 / flash-attention-lite

torotoki / simple-paged-attention

Improve this page

Add this topic to your repo