Skip to content

Conversation

grahamking
Copy link
Contributor

@grahamking grahamking commented May 22, 2025

It does not need any external libraries.

Summary by CodeRabbit

  • New Features
    • The "llamacpp" feature is now enabled by default when building the package, in addition to "mistralrs".
  • Documentation
    • Updated build instructions to clarify that "llama.cpp" is built for CPU by default and to recommend GPU feature flags for optimized builds.
  • Chores
    • Streamlined and simplified the Rust build process in Dockerfiles, consolidating build steps and artifact copying.
    • Updated Rust installation to use the default profile instead of the minimal profile.
    • Removed license header comments from configuration files.

Co-authored-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Graham King <graham@gkgk.org>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
container/Dockerfile.tensorrt_llm (1)

303-304: Implement C bindings build as previously suggested
This exactly matches the earlier recommendation to run cd lib/bindings/c && cargo build --release --locked in the dev image to generate C include files and the shared library.

🧹 Nitpick comments (2)
container/Dockerfile.tensorrt_llm (2)

207-207: Consider image size implications of switching Rust profile
You’ve changed the rustup-init profile from minimal to default, which pulls in extra components (docs, source, rustfmt, etc.). While this unifies the Dockerfiles, it may significantly increase image size. Please verify that this change doesn’t bloat the layer beyond acceptable limits or evaluate if you can selectively install only the necessary components.


299-299: Refine workspace copy to limit to necessary artifacts
Copying the entire /workspace directory from wheel_builder may include source files and intermediate artifacts you don’t need at runtime. Consider restricting this to only the built artifacts (e.g., /workspace/dist, /workspace/target) to reduce image size and surface area.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95bc0a8 and 039d1fe.

📒 Files selected for processing (3)
  • container/Dockerfile.sglang (2 hunks)
  • container/Dockerfile.tensorrt_llm (3 hunks)
  • container/Dockerfile.vllm (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • container/Dockerfile.sglang
  • container/Dockerfile.vllm
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Build Documentation
  • GitHub Check: Build and Test - vllm
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (2)
container/Dockerfile.tensorrt_llm (2)

276-280: Consolidated workspace build improves maintainability
Switching to cargo build --workspace with the dynamo-llm/block-manager feature simplifies the build steps and ensures all crates are built in one go. This aligns well with your goal of unifying Rust build logic across images.


298-298: No substantive code on this line.

@grahamking grahamking merged commit b889948 into main May 29, 2025
12 of 13 checks passed
@grahamking grahamking deleted the gk-llamacpp-default branch May 29, 2025 16:05
grahamking added a commit that referenced this pull request May 29, 2025
Previously `mistral.rs` was the default engine for both safetensors and
GGUF models. Now it is only the default for safetensors, `llama.cpp`
becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types
  of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can
use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.
grahamking added a commit that referenced this pull request May 29, 2025
Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants