📦 llm-d v0.2.0 Release Notes

For information on installing and using the new release refer to our quickstarts.

Release Date: 2025-07-28

Core Objectives

This release had a few key objectives:

Migrate from monolithic to composable installs based on community feedback
Support wide expert parallelism cases of "one rank per node"
Align with upstream gateway-api-inference-extension helm charts

🧩 Component Summary

Component	Version	Previous Version	Type
llmd/llm-d-inference-scheduler	`v0.2.1`	`0.0.4`	Image
llm-d/llm-d-model-service	NA (Deprecated)	`0.0.10`	Image
llm-d-incubation/llm-d-modelservice	`v0.2.0`	NA (New)	Helm Chart
llm-d/llm-d-routing-sidecar	`v0.2.0`	`0.0.6`	Image
llm-d/llm-d-deployer	NA (Depricated)	`1.0.22`	Helm Chart
vllm-project/vllm	`v0.10.0`	NA (built from fork)	Wheel installed in `llm-d`
llm-d/llm-d	`v0.2.0`	`0.0.8`	Image
llm-d/llm-d-inference-sim	`v0.3.0`	`0.0.4`	Image
llm-d-incubation/llm-d-infra	`v1.1.1`	NA (New)	Helm Chart
kubernetes-sig/gateway-api-inference-extension	`v0.5.1`	NA (New - external)	Image
llm-d/llm-d-kv-cache-manager	`v0.2.0`	`v0.1.0`	Go Package (consumed in `inference-scheduler`)
llm-d/llm-d-benchmark	`v0.2.0`	`v0.0.8`	Tooling and Image

NOTE: In future we want to support compatibility matrixes. However as we are still getting off the ground, we cannot ensure that these components work with legacy versions.

🔹 lmd/llm-d-inference-scheduler

Description: The inference scheduler makes optimized routing decisions for inference requests to vLLM model servers. This component depends on the upstream gateway-api-inference-extension scheduling framework and includes features specific to vLLM.
Diff: 0.0.4 → v0.2.1
Upstream Changelog - since we bumped the upstream version of GIE many changes do not show up in the diff. The following has been pulled from release notes:

🔹 llm-d/llm-d-model-service (Deprecated)

Description: ModelService is a Kubernetes operator (CRD + controller) that enables the creation of vllm pods and routing resources for a given model.
Status: This repo is being deprecated as a component of llm-d and has been archived.
Replacement: lm-d-incubation/llm-d-modelservice

🔹 llm-d-incubation/llm-d-modelservice (New)

Description: modelservice is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).
History (new): v0.2.0

🔹 llm-d/llm-d-routing-sidecar

Description: A reverse proxy routing traffic between the inference-scheduler and the prefill and decode workers based on the x-prefiller-host-port HTTP request header.
Diff: 0.0.6 → v0.2.0

🔹 llm-d/llm-d-deployer (Deprecated)

Description: A repo containing examples, Helm charts, and release assets for llm-d.
Status: This repo is being deprecated as a component of llm-d however the repo will not be archived so that it may support people who want to try the legacy install. Will be minimally maintained.
Replacement: llm-d-incubation/llm-d-infra

🔹 vllm-project/vllm (Upstream)

Description: vLLM is a fast and easy-to-use library for LLM inference and serving. This project is the inferencing engine that forms the upstream of our llm-d/llm-d image.
Release: v0.10.0

🔹 llm-d/llm-d

Description: A midstreamed image of vllm-project/vllm for inferencing, supporting features such as PD disaggregation, KV cache awareness and more.
Diff: 0.0.8 → v0.2.0

🔹 llm-d/llm-d-inference-sim

Description: a light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.
Diff: 0.0.4 → v0.3.0

🔹 llm-d-incubation/llm-d-infra (New)

Description: This repository includes examples, Helm charts, and release assets for llm-d-infra.
History (new): v1.1.1

🔹 kubernetes-sig/gateway-api-inference-extension (New - Upstream)

Note: Note: The upstream project is a dependency of llm-d and we directly reference the published Helm charts. The release notes will not include all changes in this component from release to release, please consult with the upstream release pages.
Description: A Helm chart to deploy an InferencePool, a corresponding EndpointPicker (epp) deployment, and any other related assets.
History (new - external): v0.5.1

🔹 llm-d/llm-d-kv-cache-manager

Description: This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.
Diff: v0.1.0 → v0.2.0

🔹 llm-d/llm-d-benchmark

Description: This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.
Diff: v0.0.8 → v0.2.0

For more information on any of the component project or versions, please checkout their repos directly. Thank you to all contributors who helped make this happen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

📦 llm-d v0.2.0 Release Notes

Core Objectives

🧩 Component Summary

🔹 lmd/llm-d-inference-scheduler

🔹 llm-d/llm-d-model-service (Deprecated)

🔹 llm-d-incubation/llm-d-modelservice (New)

🔹 llm-d/llm-d-routing-sidecar

🔹 llm-d/llm-d-deployer (Deprecated)

🔹 vllm-project/vllm (Upstream)

🔹 llm-d/llm-d

🔹 llm-d/llm-d-inference-sim

🔹 llm-d-incubation/llm-d-infra (New)

🔹 kubernetes-sig/gateway-api-inference-extension (New - Upstream)

🔹 llm-d/llm-d-kv-cache-manager

🔹 llm-d/llm-d-benchmark

Uh oh!

Releases: llm-d/llm-d

v0.2.0

📦 llm-d v0.2.0 Release Notes

Core Objectives

🧩 Component Summary

🔹 lmd/llm-d-inference-scheduler

🔹 llm-d/llm-d-model-service (Deprecated)

🔹 llm-d-incubation/llm-d-modelservice (New)

🔹 llm-d/llm-d-routing-sidecar

🔹 llm-d/llm-d-deployer (Deprecated)

🔹 vllm-project/vllm (Upstream)

🔹 llm-d/llm-d

🔹 llm-d/llm-d-inference-sim

🔹 llm-d-incubation/llm-d-infra (New)

🔹 kubernetes-sig/gateway-api-inference-extension (New - Upstream)

🔹 llm-d/llm-d-kv-cache-manager

🔹 llm-d/llm-d-benchmark

Uh oh!