llm-inference

Here are 571 public repositories matching this topic...

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Aug 11, 2025
Python

Lightning-AI / litgpt

Star

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Aug 6, 2025
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

Updated Aug 4, 2025
Python

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Aug 11, 2025
Python

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Aug 10, 2025
Python

superduper-io / superduper

Star

Superduper: End-to-end framework for building custom AI applications and agents.

Updated Aug 8, 2025
Python

kserve / kserve

Star

Standardized Serverless ML Inference Platform on Kubernetes

kubernetes machine-learning tensorflow sklearn pytorch artificial-intelligence xgboost k8s service-mesh hacktoberfest istio model-serving kubeflow mlops knative model-interpretability kserve genai llm-inference

Updated Aug 10, 2025
Python

xlite-dev / Awesome-LLM-Inference

Star

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Aug 6, 2025
Python

codelion / openevolve

Sponsor

Star

Open-source implementation of AlphaEvolve

genetic-algorithm discovery optimize evolutionary-algorithms deepmind-lab deepmind iterative-methods genetic-algorithms evolutionary-computation alphacode distributed-evolutionary-algorithms iterative-refinement llm-inference llm-engineering llm-ensemble coding-agent alpha-evolve alphaevolve openevolve

Updated Aug 10, 2025
Python

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated May 21, 2025
Python

gpustack / gpustack

Star

Simple, scalable AI model deployment on GPU clusters

Updated Aug 7, 2025
Python

neuralmagic / deepsparse

Star

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

Updated Jun 2, 2025
Python

codelion / optillm

Sponsor

Star

Optimizing inference proxy for LLMs

Updated Jul 27, 2025
Python

databricks / dbrx

Star

Code examples and resources for DBRX, a large language model developed by Databricks

databricks llm generative-ai gen-ai llm-training llm-inference mosaic-ai

Updated May 1, 2024
Python

intel / intel-extension-for-transformers

Star

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Oct 8, 2024
Python

SafeAILab / EAGLE

Star

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

large-language-models llm-inference speculative-decoding

Updated Aug 9, 2025
Python

character-ai / prompt-poet

Star

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

prompt prompt-tuning llm prompt-engineering prompting prompt-design llm-inference

Updated Jul 22, 2025
Python

shyamsaktawat / OpenAlpha_Evolve

Star

OpenAlpha_Evolve is an open-source Python framework inspired by the groundbreaking research on autonomous coding agents like DeepMind's AlphaEvolve.

google genetic-algorithm discovery optimize evolutionary-algorithms iterative-methods evolutionary-algorithm evolution-computing alphafold alphacode distributed-evolutionary-algorithms iterative-refinement llm-inference llm-engineering llm-ensemble coding-agent openevolve

Updated May 31, 2025
Python

harleyszhang / llm_note

Star

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

cuda-programming transformer-models kv-cache llm vllm llm-inference triton-kernels

Updated Aug 9, 2025
Python

stoyan-stoyanov / llmflows

Star

LLMFlows - Simple, Explicit and Transparent LLM Apps

python machine-learning ai openai question-answering vector-database gpt-4 llm prompt-engineering llms chatgpt llmops llm-inference

Updated Feb 20, 2025
Python

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-inference

Here are 571 public repositories matching this topic...

ray-project / ray

Lightning-AI / litgpt

bentoml / OpenLLM

bentoml / BentoML

InternLM / lmdeploy

superduper-io / superduper

kserve / kserve

xlite-dev / Awesome-LLM-Inference

codelion / openevolve

predibase / lorax

gpustack / gpustack

neuralmagic / deepsparse

codelion / optillm

databricks / dbrx

intel / intel-extension-for-transformers

SafeAILab / EAGLE

character-ai / prompt-poet

shyamsaktawat / OpenAlpha_Evolve

harleyszhang / llm_note

stoyan-stoyanov / llmflows

Improve this page

Add this topic to your repo