chore: Make llama.cpp a default engine, speedup CI #1177

grahamking · 2025-05-22T20:57:16Z

It does not need any external libraries.

Summary by CodeRabbit

New Features
- The "llamacpp" feature is now enabled by default when building the package, in addition to "mistralrs".
Documentation
- Updated build instructions to clarify that "llama.cpp" is built for CPU by default and to recommend GPU feature flags for optimized builds.
Chores
- Streamlined and simplified the Rust build process in Dockerfiles, consolidating build steps and artifact copying.
- Updated Rust installation to use the default profile instead of the minimal profile.
- Removed license header comments from configuration files.

container/Dockerfile.tensorrt_llm

container/Dockerfile.vllm

container/Dockerfile.sglang

Co-authored-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Graham King <graham@gkgk.org>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

container/Dockerfile.tensorrt_llm (1)

303-304: Implement C bindings build as previously suggested
This exactly matches the earlier recommendation to run cd lib/bindings/c && cargo build --release --locked in the dev image to generate C include files and the shared library.

🧹 Nitpick comments (2)

container/Dockerfile.tensorrt_llm (2)

207-207: Consider image size implications of switching Rust profile
You’ve changed the rustup-init profile from minimal to default, which pulls in extra components (docs, source, rustfmt, etc.). While this unifies the Dockerfiles, it may significantly increase image size. Please verify that this change doesn’t bloat the layer beyond acceptable limits or evaluate if you can selectively install only the necessary components.

299-299: Refine workspace copy to limit to necessary artifacts
Copying the entire /workspace directory from wheel_builder may include source files and intermediate artifacts you don’t need at runtime. Consider restricting this to only the built artifacts (e.g., /workspace/dist, /workspace/target) to reduce image size and surface area.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95bc0a8 and 039d1fe.

📒 Files selected for processing (3)

container/Dockerfile.sglang (2 hunks)
container/Dockerfile.tensorrt_llm (3 hunks)
container/Dockerfile.vllm (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

container/Dockerfile.sglang
container/Dockerfile.vllm

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: Build Documentation
GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (2)

container/Dockerfile.tensorrt_llm (2)

276-280: Consolidated workspace build improves maintainability
Switching to cargo build --workspace with the dynamo-llm/block-manager feature simplifies the build steps and ensures all crates are built in one go. This aligns well with your goal of unifying Rust build logic across images.

298-298: No substantive code on this line.

Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF. Why? - Since #1177 `llama.cpp` is built-in by default, so we can switch. - `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch. Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model. We can still run GGUF with mistralrs by doing `out=mistralrs`.

grahamking requested review from ryanolson, paulhendricks, biswapanda, tmonty12, GuanLuo, rmccorm4, kkranen, oandreeva-nv and a team as code owners May 22, 2025 20:57

pull-request-size bot added the size/S label May 22, 2025

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 20:57 Inactive

github-actions bot added the chore label May 22, 2025

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 20:58 Inactive

grahamking requested review from tanmayv25, ptarasiewiczNV, ishandhanani, alec-flowers and nnshah1 as code owners May 22, 2025 21:30

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 21:30 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 21:32 Inactive

grahamking force-pushed the gk-llamacpp-default branch from 8514ef5 to 59bcf14 Compare May 22, 2025 21:33

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 21:33 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 21:37 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 22:27 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 22, 2025 22:32 Inactive

grahamking force-pushed the gk-llamacpp-default branch from 364c121 to 2ab8eb2 Compare May 23, 2025 14:19

pull-request-size bot added size/M and removed size/S labels May 23, 2025

copy-pr-bot bot temporarily deployed to GITLAB May 23, 2025 14:19 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 23, 2025 14:20 Inactive

nv-anants reviewed May 29, 2025

View reviewed changes

container/Dockerfile.tensorrt_llm Outdated Show resolved Hide resolved

container/Dockerfile.vllm Outdated Show resolved Hide resolved

container/Dockerfile.sglang Outdated Show resolved Hide resolved

container/Dockerfile.sglang Outdated Show resolved Hide resolved

Update container/Dockerfile.sglang

d994e95

Co-authored-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Graham King <graham@gkgk.org>

copy-pr-bot bot temporarily deployed to GITLAB May 29, 2025 15:26 Inactive

Update container/Dockerfile.tensorrt_llm

2b392d9

Co-authored-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Graham King <graham@gkgk.org>

copy-pr-bot bot temporarily deployed to GITLAB May 29, 2025 15:26 Inactive

Update container/Dockerfile.vllm

3684420

Co-authored-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Graham King <graham@gkgk.org>

copy-pr-bot bot temporarily deployed to GITLAB May 29, 2025 15:27 Inactive

Update container/Dockerfile.sglang

039d1fe

Co-authored-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Graham King <graham@gkgk.org>

copy-pr-bot bot temporarily deployed to GITLAB May 29, 2025 15:27 Inactive

nv-anants approved these changes May 29, 2025

View reviewed changes

coderabbitai bot reviewed May 29, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB May 29, 2025 15:37 Inactive

grahamking merged commit b889948 into main May 29, 2025
12 of 13 checks passed

grahamking deleted the gk-llamacpp-default branch May 29, 2025 16:05

grahamking mentioned this pull request May 29, 2025

feat(dynamo-run): Use llama.cpp as the default engine for GGUF #1276

Merged

coderabbitai bot mentioned this pull request May 30, 2025

feat(dynamo-run): Default to building with CUDA #1295

Closed

grahamking mentioned this pull request Jun 24, 2025

llamacpp ease of use investigation, external libraries #576

Closed

This was referenced Jul 16, 2025

chore(bindings): Remove mistralrs / llama.cpp #1970

Merged

fix: correct Nixl plugin paths in Dockerfile. #2048

Merged

This was referenced Jul 23, 2025

docs: Update docs for new UX #2070

Merged

feat: update python packaging for new dynamo UX #2054

Merged

feat: deprecate sdk as dependency #2149

Merged

fix: install rdma libs in runtime image. #2163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Make llama.cpp a default engine, speedup CI #1177

chore: Make llama.cpp a default engine, speedup CI #1177

Uh oh!

grahamking commented May 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

chore: Make llama.cpp a default engine, speedup CI #1177

chore: Make llama.cpp a default engine, speedup CI #1177

Uh oh!

Conversation

grahamking commented May 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

grahamking commented May 22, 2025 •

edited by coderabbitai bot

Loading