Skip to content

Conversation

nv-anants
Copy link
Contributor

@nv-anants nv-anants commented Jul 7, 2025

Overview:

Add new runtime image (~21 GB) to the TensorRT-LLM Dockerfile, enabling the creation of optimized production containers that are significantly smaller than the development image (~31 GB) while maintaining all necessary runtime dependencies.

Key Details -

  • Runtime base image: nvcr.io/nvidia/cuda:12.9.0-runtime-ubuntu24.04
  • Installs essential deb packages (build-essential, python3-dev, openssh-client, openssh-server)
  • Copies only the necessary components from build stages:
    • Dynamo bindings (wheels, libraries, headers)
    • NATS and etcd
    • UCX and NIXL libraries for NIXL plugins
    • OpenMPI components
    • CUDA toolkit components (nvcc, cudafe++, ptxas, fatbinary, headers, libcudart)
    • PyTorch installation from NGC PyTorch base image
    • Installs TensorRT-LLM and Dynamo wheels using uv package manager
    • Setup required environment variables
    • Includes benchmarks, examples, tests and test dependencies for CI pipeline compatibility. ToDo: remove these later and build CI image on top of runtime image

Tested only for amd64 architecture.

closes: OPS-167

Summary by CodeRabbit

  • New Features

    • Introduced a new runtime container image for TensorRT-LLM, optimized for deployment with essential system packages and pre-installed dependencies.
    • Python 3.12 virtual environment is now set up automatically with all required libraries and tools.
    • Benchmarks, examples, and test directories are included in the runtime image for easier validation and CI workflows.
  • Chores

    • Updated environment variable configurations and improved dependency installation in the development image.

Copy link
Contributor

coderabbitai bot commented Jul 8, 2025

Walkthrough

A new multi-stage runtime build target was added to the container/Dockerfile.tensorrt_llm, based on a CUDA runtime image. This stage installs essential system and Python packages, copies runtime artifacts and dependencies from previous stages, sets up environment variables, installs required Python wheels, and prepares the image for CI usage. Minor updates were also made to the dev stage.

Changes

File(s) Change Summary
container/Dockerfile.tensorrt_llm Added a new runtime build stage with CUDA runtime base, installed system and Python dependencies, copied runtime artifacts and wheels, configured environment variables, and included CI assets. Updated dev stage to set VIRTUAL_ENV and install NIXL wheel.

Sequence Diagram(s)

sequenceDiagram
    participant Docker Build
    participant Dev Stage
    participant Build Stage
    participant Runtime Stage

    Docker Build->>Dev Stage: Build dev image, install dependencies
    Dev Stage->>Build Stage: Pass build artifacts (wheels, binaries)
    Build Stage->>Runtime Stage: Copy runtime artifacts, wheels, CUDA, PyTorch, etc.
    Runtime Stage->>Runtime Stage: Install system packages and Python dependencies
    Runtime Stage->>Runtime Stage: Set environment variables
    Runtime Stage->>Runtime Stage: Install wheels (TensorRT-LLM, Dynamo, NIXL)
    Runtime Stage->>Runtime Stage: Copy benchmarks, tests, and CI assets
    Runtime Stage->>Runtime Stage: Set entrypoint and CMD
Loading

Possibly related PRs

Poem

🐇
In Docker’s world, a new stage appears,
With CUDA and Python, it now perseveres.
Wheels and benchmarks, all snug in their place,
Environment variables set with grace.
CI is ready, the runtime is bright—
Hop along, containers, everything’s right!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
container/Dockerfile.tensorrt_llm (3)

221-229: Duplicate wheel-build logic bloats the build stage

uv build + uv pip install for NIXL is already handled later by the dedicated wheel_builder stage. Re-building and installing it here lengthens the build and increases the final image size without adding functionality.

-# Install NIXL Python module
-RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl
-
-# Install the wheel
-# TODO: Move NIXL wheel install to the wheel_builder stage
-RUN uv pip install /workspace/wheels/nixl/*.whl

Drop the redundant lines above and rely on the wheel produced in wheel_builder.


377-384: APT cache not cleaned – adds ~200 MB to the runtime layer

You delete /var/lib/apt/lists but leave the package cache in /var/cache/apt/archives.

-    rm -rf /var/lib/apt/lists/*
+    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

461-467: Missing $NIXL_PLUGIN_DIR in LD_LIBRARY_PATH order

LD_LIBRARY_PATH prepends NIXL after /usr/local/ucx/lib; put plugin dir before UCX so CUDA symbols are found first and avoid accidental shadowing.

-ENV LD_LIBRARY_PATH=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/ucx/lib:...
+ENV LD_LIBRARY_PATH=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/ucx/lib:...
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1630f8b and 0091cf6.

📒 Files selected for processing (1)
  • container/Dockerfile.tensorrt_llm (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
container/Dockerfile.tensorrt_llm (1)
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.

@nv-anants nv-anants merged commit 909a9a9 into main Jul 8, 2025
9 of 10 checks passed
@nv-anants nv-anants deleted the anants/trtllm-runtime branch July 8, 2025 13:37
--extra-index-url https://pypi.org/simple \
"${TENSORRTLLM_PIP_WHEEL}" && \
uv pip install ai-dynamo --find-links wheelhouse && \
uv pip install nixl --find-links wheelhouse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future I would consider installing nixl first. I get that nixl is an optional dependency of ai-dynamo at the moment but I would not want to get the wrong version should that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but my concern with installing nixl first is that if it brings any dependency which is common and not version controlled in ai-dynamo, the one from nixl will be installed and used. We should do single install i think, options there could be

  1. Single command pip install ai-dynamo nixl ... or
  2. We add a ai-dynamo[nixl] option in dynamo install

Copy link
Contributor

@nv-kmcgill53 nv-kmcgill53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a couple of refining steps to make this better. On the whole, it works.

ENV VIRTUAL_ENV=/opt/dynamo/venv

# Install NIXL Python module
RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this build able to be cached and optionally downloaded? If we already have a build with the commit this was going to use, then we can skip the build step and download and install.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build is cached, as we force a commit sha. Download is a bit of bigger change. Ideally I would want to use the wheel built from nixl repo CI and not build anything in dynamo

COPY --from=build /usr/local/cuda/lib64/libcudart.so* /usr/local/cuda/lib64/
COPY --from=build /usr/local/cuda/nvvm /usr/local/cuda/nvvm

# Copy pytorch installation from NGC PyTorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this all be better defined as a pip install ... rather than copying the installed packages themselves? We can still pin the dependencies and it will be easier to read (or maintain) in the future. It also makes explicit what is unique in the trtllm container and what came from upstream. I'm open to hearing arguments on this.

I get why you did it this way to begin with, to prove we have all the packages needed for trtllm. Now we have something to compare the next iteration against to make it better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are internal versions, so I dont think there's any easy way to pip install them? I could be wrong here

RUN --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.txt \
uv pip install --requirement /tmp/requirements.txt

# Copy CUDA toolkit components needed for nvcc, cudafe, cicc etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we will run into an issue where the BASE container is not on the same cuda version as the RUNTIME container, thus causing issues down the line with straight copying the libs. I would highlight this as a risk rather than a hard stop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but its the same case with other packages. I believe torch and other packages also expect a certain version of cuda so they would also need to match wrt BASE and RUNTIME containers? I can try using the cuda-devel image which comes with all these cuda binaries, but I would need to check on size impact for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants