Releases: llm-d/llm-d
Releases · llm-d/llm-d
v0.2.0
📦 llm-d v0.2.0 Release Notes
For information on installing and using the new release refer to our quickstarts.
Release Date: 2025-07-28
Core Objectives
This release had a few key objectives:
- Migrate from monolithic to composable installs based on community feedback
- Support wide expert parallelism cases of "one rank per node"
- Align with upstream gateway-api-inference-extension helm charts
🧩 Component Summary
Component | Version | Previous Version | Type |
---|---|---|---|
llmd/llm-d-inference-scheduler | v0.2.1 |
0.0.4 |
Image |
llm-d/llm-d-model-service | NA (Deprecated) | 0.0.10 |
Image |
llm-d-incubation/llm-d-modelservice | v0.2.0 |
NA (New) | Helm Chart |
llm-d/llm-d-routing-sidecar | v0.2.0 |
0.0.6 |
Image |
llm-d/llm-d-deployer | NA (Depricated) | 1.0.22 |
Helm Chart |
vllm-project/vllm | v0.10.0 |
NA (built from fork) | Wheel installed in llm-d |
llm-d/llm-d | v0.2.0 |
0.0.8 |
Image |
llm-d/llm-d-inference-sim | v0.3.0 |
0.0.4 |
Image |
llm-d-incubation/llm-d-infra | v1.1.1 |
NA (New) | Helm Chart |
kubernetes-sig/gateway-api-inference-extension | v0.5.1 |
NA (New - external) | Image |
llm-d/llm-d-kv-cache-manager | v0.2.0 |
v0.1.0 |
Go Package (consumed in inference-scheduler ) |
llm-d/llm-d-benchmark | v0.2.0 |
v0.0.8 |
Tooling and Image |
NOTE: In future we want to support compatibility matrixes. However as we are still getting off the ground, we cannot ensure that these components work with legacy versions.
🔹 lmd/llm-d-inference-scheduler
- Description: The inference scheduler makes optimized routing decisions for inference requests to vLLM model servers. This component depends on the upstream gateway-api-inference-extension scheduling framework and includes features specific to vLLM.
- Diff: 0.0.4 → v0.2.1
- Upstream Changelog - since we bumped the upstream version of GIE many changes do not show up in the diff. The following has been pulled from release notes:
🔹 llm-d/llm-d-model-service (Deprecated)
- Description:
ModelService
is a Kubernetes operator (CRD + controller) that enables the creation of vllm pods and routing resources for a given model. - Status: This repo is being deprecated as a component of
llm-d
and has been archived. - Replacement:
lm-d-incubation/llm-d-modelservice
🔹 llm-d-incubation/llm-d-modelservice (New)
- Description:
modelservice
is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet). - History (new): v0.2.0
🔹 llm-d/llm-d-routing-sidecar
- Description: A reverse proxy routing traffic between the
inference-scheduler
and the prefill and decode workers based on the x-prefiller-host-port HTTP request header. - Diff: 0.0.6 → v0.2.0
🔹 llm-d/llm-d-deployer (Deprecated)
- Description: A repo containing examples, Helm charts, and release assets for
llm-d
. - Status: This repo is being deprecated as a component of
llm-d
however the repo will not be archived so that it may support people who want to try the legacy install. Will be minimally maintained. - Replacement:
llm-d-incubation/llm-d-infra
🔹 vllm-project/vllm (Upstream)
- Description:
vLLM
is a fast and easy-to-use library for LLM inference and serving. This project is the inferencing engine that forms the upstream of ourllm-d/llm-d
image. - Release: v0.10.0
🔹 llm-d/llm-d
- Description: A midstreamed image of
vllm-project/vllm
for inferencing, supporting features such as PD disaggregation, KV cache awareness and more. - Diff: 0.0.8 → v0.2.0
🔹 llm-d/llm-d-inference-sim
- Description: a light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.
- Diff: 0.0.4 → v0.3.0
🔹 llm-d-incubation/llm-d-infra (New)
- Description: This repository includes examples, Helm charts, and release assets for llm-d-infra.
- History (new): v1.1.1
🔹 kubernetes-sig/gateway-api-inference-extension (New - Upstream)
- Note: Note: The upstream project is a dependency of llm-d and we directly reference the published Helm charts. The release notes will not include all changes in this component from release to release, please consult with the upstream release pages.
- Description: A Helm chart to deploy an InferencePool, a corresponding EndpointPicker (epp) deployment, and any other related assets.
- History (new - external): v0.5.1
🔹 llm-d/llm-d-kv-cache-manager
- Description: This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.
- Diff: v0.1.0 → v0.2.0
🔹 llm-d/llm-d-benchmark
- Description: This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.
- Diff: v0.0.8 → v0.2.0
For more information on any of the component project or versions, please checkout their repos directly. Thank you to all contributors who helped make this happen.