-
Notifications
You must be signed in to change notification settings - Fork 579
feat: add runtime image for trtllm container build #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new multi-stage Changes
Sequence Diagram(s)sequenceDiagram
participant Docker Build
participant Dev Stage
participant Build Stage
participant Runtime Stage
Docker Build->>Dev Stage: Build dev image, install dependencies
Dev Stage->>Build Stage: Pass build artifacts (wheels, binaries)
Build Stage->>Runtime Stage: Copy runtime artifacts, wheels, CUDA, PyTorch, etc.
Runtime Stage->>Runtime Stage: Install system packages and Python dependencies
Runtime Stage->>Runtime Stage: Set environment variables
Runtime Stage->>Runtime Stage: Install wheels (TensorRT-LLM, Dynamo, NIXL)
Runtime Stage->>Runtime Stage: Copy benchmarks, tests, and CI assets
Runtime Stage->>Runtime Stage: Set entrypoint and CMD
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
container/Dockerfile.tensorrt_llm (3)
221-229
: Duplicate wheel-build logic bloats the build stage
uv build
+uv pip install
for NIXL is already handled later by the dedicatedwheel_builder
stage. Re-building and installing it here lengthens the build and increases the final image size without adding functionality.-# Install NIXL Python module -RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl - -# Install the wheel -# TODO: Move NIXL wheel install to the wheel_builder stage -RUN uv pip install /workspace/wheels/nixl/*.whlDrop the redundant lines above and rely on the wheel produced in
wheel_builder
.
377-384
: APT cache not cleaned – adds ~200 MB to the runtime layerYou delete
/var/lib/apt/lists
but leave the package cache in/var/cache/apt/archives
.- rm -rf /var/lib/apt/lists/* + rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
461-467
: Missing$NIXL_PLUGIN_DIR
inLD_LIBRARY_PATH
order
LD_LIBRARY_PATH
prepends NIXL after/usr/local/ucx/lib
; put plugin dir before UCX so CUDA symbols are found first and avoid accidental shadowing.-ENV LD_LIBRARY_PATH=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/ucx/lib:... +ENV LD_LIBRARY_PATH=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/ucx/lib:...
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
container/Dockerfile.tensorrt_llm
(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
container/Dockerfile.tensorrt_llm (1)
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.
--extra-index-url https://pypi.org/simple \ | ||
"${TENSORRTLLM_PIP_WHEEL}" && \ | ||
uv pip install ai-dynamo --find-links wheelhouse && \ | ||
uv pip install nixl --find-links wheelhouse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future I would consider installing nixl
first. I get that nixl
is an optional dependency of ai-dynamo
at the moment but I would not want to get the wrong version should that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but my concern with installing nixl first is that if it brings any dependency which is common and not version controlled in ai-dynamo, the one from nixl will be installed and used. We should do single install i think, options there could be
- Single command
pip install ai-dynamo nixl ...
or - We add a
ai-dynamo[nixl]
option in dynamo install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a couple of refining steps to make this better. On the whole, it works.
ENV VIRTUAL_ENV=/opt/dynamo/venv | ||
|
||
# Install NIXL Python module | ||
RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this build able to be cached and optionally downloaded? If we already have a build with the commit this was going to use, then we can skip the build step and download and install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build is cached, as we force a commit sha. Download is a bit of bigger change. Ideally I would want to use the wheel built from nixl repo CI and not build anything in dynamo
COPY --from=build /usr/local/cuda/lib64/libcudart.so* /usr/local/cuda/lib64/ | ||
COPY --from=build /usr/local/cuda/nvvm /usr/local/cuda/nvvm | ||
|
||
# Copy pytorch installation from NGC PyTorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this all be better defined as a pip install ...
rather than copying the installed packages themselves? We can still pin the dependencies and it will be easier to read (or maintain) in the future. It also makes explicit what is unique in the trtllm container and what came from upstream. I'm open to hearing arguments on this.
I get why you did it this way to begin with, to prove we have all the packages needed for trtllm. Now we have something to compare the next iteration against to make it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these are internal versions, so I dont think there's any easy way to pip install them? I could be wrong here
RUN --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.txt \ | ||
uv pip install --requirement /tmp/requirements.txt | ||
|
||
# Copy CUDA toolkit components needed for nvcc, cudafe, cicc etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we will run into an issue where the BASE container is not on the same cuda version as the RUNTIME container, thus causing issues down the line with straight copying the libs. I would highlight this as a risk rather than a hard stop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but its the same case with other packages. I believe torch and other packages also expect a certain version of cuda so they would also need to match wrt BASE and RUNTIME containers? I can try using the cuda-devel image which comes with all these cuda binaries, but I would need to check on size impact for that
Overview:
Add new runtime image (~21 GB) to the TensorRT-LLM Dockerfile, enabling the creation of optimized production containers that are significantly smaller than the development image (~31 GB) while maintaining all necessary runtime dependencies.
Key Details -
Tested only for amd64 architecture.
closes: OPS-167
Summary by CodeRabbit
New Features
Chores