feat: add runtime image for trtllm container build #1796

nv-anants · 2025-07-07T17:18:04Z

Overview:

Add new runtime image (~21 GB) to the TensorRT-LLM Dockerfile, enabling the creation of optimized production containers that are significantly smaller than the development image (~31 GB) while maintaining all necessary runtime dependencies.

Key Details -

Runtime base image: nvcr.io/nvidia/cuda:12.9.0-runtime-ubuntu24.04
Installs essential deb packages (build-essential, python3-dev, openssh-client, openssh-server)
Copies only the necessary components from build stages:
- Dynamo bindings (wheels, libraries, headers)
- NATS and etcd
- UCX and NIXL libraries for NIXL plugins
- OpenMPI components
- CUDA toolkit components (nvcc, cudafe++, ptxas, fatbinary, headers, libcudart)
- PyTorch installation from NGC PyTorch base image
- Installs TensorRT-LLM and Dynamo wheels using uv package manager
- Setup required environment variables
- Includes benchmarks, examples, tests and test dependencies for CI pipeline compatibility. ToDo: remove these later and build CI image on top of runtime image

Tested only for amd64 architecture.

closes: OPS-167

Summary by CodeRabbit

New Features
- Introduced a new runtime container image for TensorRT-LLM, optimized for deployment with essential system packages and pre-installed dependencies.
- Python 3.12 virtual environment is now set up automatically with all required libraries and tools.
- Benchmarks, examples, and test directories are included in the runtime image for easier validation and CI workflows.
Chores
- Updated environment variable configurations and improved dependency installation in the development image.

coderabbitai · 2025-07-08T00:02:22Z

Walkthrough

A new multi-stage runtime build target was added to the container/Dockerfile.tensorrt_llm, based on a CUDA runtime image. This stage installs essential system and Python packages, copies runtime artifacts and dependencies from previous stages, sets up environment variables, installs required Python wheels, and prepares the image for CI usage. Minor updates were also made to the dev stage.

Changes

File(s)	Change Summary
container/Dockerfile.tensorrt_llm	Added a new `runtime` build stage with CUDA runtime base, installed system and Python dependencies, copied runtime artifacts and wheels, configured environment variables, and included CI assets. Updated `dev` stage to set `VIRTUAL_ENV` and install NIXL wheel.

Sequence Diagram(s)

sequenceDiagram
    participant Docker Build
    participant Dev Stage
    participant Build Stage
    participant Runtime Stage

    Docker Build->>Dev Stage: Build dev image, install dependencies
    Dev Stage->>Build Stage: Pass build artifacts (wheels, binaries)
    Build Stage->>Runtime Stage: Copy runtime artifacts, wheels, CUDA, PyTorch, etc.
    Runtime Stage->>Runtime Stage: Install system packages and Python dependencies
    Runtime Stage->>Runtime Stage: Set environment variables
    Runtime Stage->>Runtime Stage: Install wheels (TensorRT-LLM, Dynamo, NIXL)
    Runtime Stage->>Runtime Stage: Copy benchmarks, tests, and CI assets
    Runtime Stage->>Runtime Stage: Set entrypoint and CMD

Possibly related PRs

build: enable vllm runtime container as default container for ci pipelines #1451: Modifies runtime Dockerfiles to add test dependencies, copy runtime artifacts, and set the entrypoint for CI, similar to this PR's enhancements for runtime container setups.
build: fixes to enable vLLM slim runtime image #1058: Adds a dedicated runtime stage in Dockerfiles, installs system dependencies, sets up Python venv, and handles NIXL wheel installation, closely mirroring this PR's approach.
feat: Using NIXL for KV cache transfer when using disaggregated serving in TRTLLM #1591: Alters TensorRT-LLM Dockerfile and environment to install and configure NIXL wheels and UCX KV cache, directly relating to this PR's Dockerfile and environment modifications.

Poem

🐇
In Docker’s world, a new stage appears,
With CUDA and Python, it now perseveres.
Wheels and benchmarks, all snug in their place,
Environment variables set with grace.
CI is ready, the runtime is bright—
Hop along, containers, everything’s right!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

container/Dockerfile.tensorrt_llm (3)
221-229: Duplicate wheel-build logic bloats the build stage

uv build + uv pip install for NIXL is already handled later by the dedicated wheel_builder stage. Re-building and installing it here lengthens the build and increases the final image size without adding functionality.
-# Install NIXL Python module
-RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl
-
-# Install the wheel
-# TODO: Move NIXL wheel install to the wheel_builder stage
-RUN uv pip install /workspace/wheels/nixl/*.whl
Drop the redundant lines above and rely on the wheel produced in wheel_builder.

377-384: APT cache not cleaned – adds ~200 MB to the runtime layer

You delete /var/lib/apt/lists but leave the package cache in /var/cache/apt/archives.
-    rm -rf /var/lib/apt/lists/*
+    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
461-467: Missing $NIXL_PLUGIN_DIR in LD_LIBRARY_PATH order

LD_LIBRARY_PATH prepends NIXL after /usr/local/ucx/lib; put plugin dir before UCX so CUDA symbols are found first and avoid accidental shadowing.
-ENV LD_LIBRARY_PATH=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/ucx/lib:...
+ENV LD_LIBRARY_PATH=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/ucx/lib:...

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1630f8b and 0091cf6.

📒 Files selected for processing (1)

container/Dockerfile.tensorrt_llm (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

container/Dockerfile.tensorrt_llm (1)

Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.

container/Dockerfile.tensorrt_llm

saturley-hall · 2025-07-08T13:58:58Z

container/Dockerfile.tensorrt_llm

+    --extra-index-url https://pypi.org/simple \
+    "${TENSORRTLLM_PIP_WHEEL}" && \
+    uv pip install ai-dynamo --find-links wheelhouse && \
+    uv pip install nixl --find-links wheelhouse


In the future I would consider installing nixl first. I get that nixl is an optional dependency of ai-dynamo at the moment but I would not want to get the wrong version should that change.

Good point, but my concern with installing nixl first is that if it brings any dependency which is common and not version controlled in ai-dynamo, the one from nixl will be installed and used. We should do single install i think, options there could be

Single command pip install ai-dynamo nixl ... or

We add a ai-dynamo[nixl] option in dynamo install

nv-kmcgill53

I see a couple of refining steps to make this better. On the whole, it works.

nv-kmcgill53 · 2025-07-08T14:28:06Z

container/Dockerfile.tensorrt_llm

+ENV VIRTUAL_ENV=/opt/dynamo/venv
+
+# Install NIXL Python module
+RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl


Is this build able to be cached and optionally downloaded? If we already have a build with the commit this was going to use, then we can skip the build step and download and install.

Build is cached, as we force a commit sha. Download is a bit of bigger change. Ideally I would want to use the wheel built from nixl repo CI and not build anything in dynamo

container/Dockerfile.tensorrt_llm

nv-kmcgill53 · 2025-07-08T14:38:42Z

container/Dockerfile.tensorrt_llm

+COPY --from=build /usr/local/cuda/lib64/libcudart.so* /usr/local/cuda/lib64/
+COPY --from=build /usr/local/cuda/nvvm /usr/local/cuda/nvvm
+
+# Copy pytorch installation from NGC PyTorch


Would this all be better defined as a pip install ... rather than copying the installed packages themselves? We can still pin the dependencies and it will be easier to read (or maintain) in the future. It also makes explicit what is unique in the trtllm container and what came from upstream. I'm open to hearing arguments on this.

I get why you did it this way to begin with, to prove we have all the packages needed for trtllm. Now we have something to compare the next iteration against to make it better.

Some of these are internal versions, so I dont think there's any easy way to pip install them? I could be wrong here

nv-kmcgill53 · 2025-07-08T14:42:25Z

container/Dockerfile.tensorrt_llm

+RUN --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.txt \
+    uv pip install --requirement /tmp/requirements.txt
+
+# Copy CUDA toolkit components needed for nvcc, cudafe, cicc etc.


I wonder if we will run into an issue where the BASE container is not on the same cuda version as the RUNTIME container, thus causing issues down the line with straight copying the libs. I would highlight this as a risk rather than a hard stop.

yes, but its the same case with other packages. I believe torch and other packages also expect a certain version of cuda so they would also need to match wrt BASE and RUNTIME containers? I can try using the cuda-devel image which comes with all these cuda binaries, but I would need to check on size impact for that

nv-anants added 6 commits July 2, 2025 09:41

init runtime image for trtllm

67eec7e

copy pt from base img

48ea1ec

fix stuff

1d1ddaf

use runtime container

9f25c85

fix mpi

605a169

fix copy

213b62a

pull-request-size bot added the size/L label Jul 7, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 17:18 Inactive

github-actions bot added the feat label Jul 7, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 17:21 Inactive

cleanup

e3bb46b

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 19:57 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 19:58 Inactive

cleanup 2

c696e9d

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 20:13 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 20:15 Inactive

fix dev

f3680ab

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 21:22 Inactive

minor

0091cf6

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 21:23 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 7, 2025 21:31 Inactive

nv-anants marked this pull request as ready for review July 7, 2025 23:59

nv-anants requested review from rmccorm4, tanmayv25, richardhuo-nv, ptarasiewiczNV, ishandhanani, alec-flowers, nnshah1 and a team as code owners July 7, 2025 23:59

coderabbitai bot reviewed Jul 8, 2025

View reviewed changes

container/Dockerfile.tensorrt_llm Show resolved Hide resolved

container/Dockerfile.tensorrt_llm Show resolved Hide resolved

container/Dockerfile.tensorrt_llm Outdated Show resolved Hide resolved

container/Dockerfile.tensorrt_llm Show resolved Hide resolved

use arch for lib copy

a878ccf

copy-pr-bot bot temporarily deployed to GITLAB July 8, 2025 00:07 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 8, 2025 00:09 Inactive

comments

bcc70f5

copy-pr-bot bot temporarily deployed to GITLAB July 8, 2025 00:15 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 8, 2025 00:20 Inactive

tanmayv25 approved these changes Jul 8, 2025

View reviewed changes

nv-anants merged commit 909a9a9 into main Jul 8, 2025
9 of 10 checks passed

nv-anants deleted the anants/trtllm-runtime branch July 8, 2025 13:37

saturley-hall reviewed Jul 8, 2025

View reviewed changes

nv-kmcgill53 reviewed Jul 8, 2025

View reviewed changes

This was referenced Jul 8, 2025

fix: update trtllm runtime build #1810

Merged

fix: fix gpu resource spec in llm deployments #1812 #1826

Merged

atchernych pushed a commit that referenced this pull request Jul 9, 2025

feat: add runtime image for trtllm container build (#1796)

7f48335

coderabbitai bot mentioned this pull request Jul 11, 2025

build: Add support for sglang runtime image #1770

Merged

coderabbitai bot mentioned this pull request Jul 21, 2025

build: support custom TRTLLM build for commits not on main branch #2021

Merged

coderabbitai bot mentioned this pull request Jul 28, 2025

fix: copy whole workspace for pre-merge vllm tests #2146

Merged

This was referenced Aug 5, 2025

fix: triton ver lock fix and trtllm env var declaration #2300

Merged

fix: use wheel files for installation in trtllm build #2372

Merged

fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) #2396

Merged

coderabbitai bot mentioned this pull request Aug 20, 2025

fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv25.5 #2579

Merged

coderabbitai bot mentioned this pull request Aug 30, 2025

feat: devcontainer with framework-specific configurations #2797

Closed

feat: add runtime image for trtllm container build #1796

feat: add runtime image for trtllm container build #1796

Uh oh!

Conversation

nv-anants commented Jul 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 8, 2025

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nv-kmcgill53 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nv-anants commented Jul 7, 2025 •

edited by coderabbitai bot

Loading