Skip to content

Releases: llm-d/llm-d

v0.2.0

29 Jul 14:52
2d4f622
Compare
Choose a tag to compare

📦 llm-d v0.2.0 Release Notes

For information on installing and using the new release refer to our quickstarts.

Release Date: 2025-07-28


Core Objectives

This release had a few key objectives:

  • Migrate from monolithic to composable installs based on community feedback
  • Support wide expert parallelism cases of "one rank per node"
  • Align with upstream gateway-api-inference-extension helm charts

🧩 Component Summary

Component Version Previous Version Type
llmd/llm-d-inference-scheduler v0.2.1 0.0.4 Image
llm-d/llm-d-model-service NA (Deprecated) 0.0.10 Image
llm-d-incubation/llm-d-modelservice v0.2.0 NA (New) Helm Chart
llm-d/llm-d-routing-sidecar v0.2.0 0.0.6 Image
llm-d/llm-d-deployer NA (Depricated) 1.0.22 Helm Chart
vllm-project/vllm v0.10.0 NA (built from fork) Wheel installed in llm-d
llm-d/llm-d v0.2.0 0.0.8 Image
llm-d/llm-d-inference-sim v0.3.0 0.0.4 Image
llm-d-incubation/llm-d-infra v1.1.1 NA (New) Helm Chart
kubernetes-sig/gateway-api-inference-extension v0.5.1 NA (New - external) Image
llm-d/llm-d-kv-cache-manager v0.2.0 v0.1.0 Go Package (consumed in inference-scheduler)
llm-d/llm-d-benchmark v0.2.0 v0.0.8 Tooling and Image

NOTE: In future we want to support compatibility matrixes. However as we are still getting off the ground, we cannot ensure that these components work with legacy versions.


🔹 lmd/llm-d-inference-scheduler

  • Description: The inference scheduler makes optimized routing decisions for inference requests to vLLM model servers. This component depends on the upstream gateway-api-inference-extension scheduling framework and includes features specific to vLLM.
  • Diff: 0.0.4 → v0.2.1
  • Upstream Changelog - since we bumped the upstream version of GIE many changes do not show up in the diff. The following has been pulled from release notes:

🔹 llm-d/llm-d-model-service (Deprecated)

  • Description: ModelService is a Kubernetes operator (CRD + controller) that enables the creation of vllm pods and routing resources for a given model.
  • Status: This repo is being deprecated as a component of llm-d and has been archived.
  • Replacement: lm-d-incubation/llm-d-modelservice

🔹 llm-d-incubation/llm-d-modelservice (New)

  • Description: modelservice is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).
  • History (new): v0.2.0

🔹 llm-d/llm-d-routing-sidecar

  • Description: A reverse proxy routing traffic between the inference-scheduler and the prefill and decode workers based on the x-prefiller-host-port HTTP request header.
  • Diff: 0.0.6 → v0.2.0

🔹 llm-d/llm-d-deployer (Deprecated)

  • Description: A repo containing examples, Helm charts, and release assets for llm-d.
  • Status: This repo is being deprecated as a component of llm-d however the repo will not be archived so that it may support people who want to try the legacy install. Will be minimally maintained.
  • Replacement: llm-d-incubation/llm-d-infra

🔹 vllm-project/vllm (Upstream)

  • Description: vLLM is a fast and easy-to-use library for LLM inference and serving. This project is the inferencing engine that forms the upstream of our llm-d/llm-d image.
  • Release: v0.10.0

🔹 llm-d/llm-d

  • Description: A midstreamed image of vllm-project/vllm for inferencing, supporting features such as PD disaggregation, KV cache awareness and more.
  • Diff: 0.0.8 → v0.2.0

🔹 llm-d/llm-d-inference-sim

  • Description: a light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.
  • Diff: 0.0.4 → v0.3.0

🔹 llm-d-incubation/llm-d-infra (New)

  • Description: This repository includes examples, Helm charts, and release assets for llm-d-infra.
  • History (new): v1.1.1

🔹 kubernetes-sig/gateway-api-inference-extension (New - Upstream)

  • Note: Note: The upstream project is a dependency of llm-d and we directly reference the published Helm charts. The release notes will not include all changes in this component from release to release, please consult with the upstream release pages.
  • Description: A Helm chart to deploy an InferencePool, a corresponding EndpointPicker (epp) deployment, and any other related assets.
  • History (new - external): v0.5.1

🔹 llm-d/llm-d-kv-cache-manager

  • Description: This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.
  • Diff: v0.1.0 → v0.2.0

🔹 llm-d/llm-d-benchmark

  • Description: This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.
  • Diff: v0.0.8 → v0.2.0

For more information on any of the component project or versions, please checkout their repos directly. Thank you to all contributors who helped make this happen.