Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
Updated
Aug 11, 2025 - Python
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Superduper: End-to-end framework for building custom AI applications and agents.
Standardized Serverless ML Inference Platform on Kubernetes
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Open-source implementation of AlphaEvolve
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Simple, scalable AI model deployment on GPU clusters
Sparsity-aware deep learning inference runtime for CPUs
Optimizing inference proxy for LLMs
Code examples and resources for DBRX, a large language model developed by Databricks
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
OpenAlpha_Evolve is an open-source Python framework inspired by the groundbreaking research on autonomous coding agents like DeepMind's AlphaEvolve.
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
LLMFlows - Simple, Explicit and Transparent LLM Apps
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."