-
Notifications
You must be signed in to change notification settings - Fork 579
feat(dynamo-run): Use llama.cpp as the default engine for GGUF #1276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF. Why? - Since #1177 `llama.cpp` is built-in by default, so we can switch. - `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch. Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model. We can still run GGUF with mistralrs by doing `out=mistralrs`.
WalkthroughThe changes refactor the handling of output engine selection in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (main.rs)
participant Lib (lib.rs)
participant Model
User->>CLI (main.rs): Provide input and optional output engine
CLI (main.rs)->>Lib (lib.rs): Call run(runtime, input, out_opt, flags)
Lib (lib.rs)->>Lib (lib.rs): Check input/output compatibility
Lib (lib.rs)->>Model: Build or prepare local model
Model->>Lib (lib.rs): Return LocalModel
Lib (lib.rs)->>Lib (lib.rs): Determine output engine (default if needed)
Lib (lib.rs)->>Lib (lib.rs): Log CUDA/Metal/Vulkan info if applicable
Lib (lib.rs)->>User: Proceed with execution
Possibly related PRs
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
launch/dynamo-run/src/lib.rs
(4 hunks)launch/dynamo-run/src/main.rs
(1 hunks)launch/dynamo-run/src/opt.rs
(0 hunks)lib/llm/src/local_model.rs
(1 hunks)lib/llm/src/model_card/model.rs
(2 hunks)
💤 Files with no reviewable changes (1)
- launch/dynamo-run/src/opt.rs
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/llm/src/local_model.rs (1)
lib/llm/src/model_card/model.rs (2)
is_gguf
(229-234)is_gguf
(390-392)
lib/llm/src/model_card/model.rs (1)
lib/llm/src/local_model.rs (1)
is_gguf
(65-69)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (8)
lib/llm/src/model_card/model.rs (2)
229-234
: Well-structured delegation pattern.The
is_gguf()
method onModelDeploymentCard
correctly delegates to the underlyingModelInfoType
through themodel_info
field. The handling of theNone
case is appropriate, returningfalse
when no model info is available.
390-392
: Clean and explicit pattern matching.The
is_gguf()
method onModelInfoType
uses proper pattern matching withmatches!
macro, making the intent clear and the implementation safe. This provides a consistent API across different model info representations.launch/dynamo-run/src/main.rs (1)
106-108
: Good refactoring to simplify CLI parsing logic.The simplified handling of
out_opt
correctly moves the responsibility for default engine selection from the CLI parsing layer to the business logic layer inlib.rs
. This separation of concerns improves maintainability.launch/dynamo-run/src/lib.rs (5)
44-50
: Clean helper functions improve readability.The new helper functions
is_in_dynamic
andis_out_dynamic
make the conditional logic more readable and maintainable. The pattern matching is explicit and correct.
55-55
: Function signature change aligns with PR objectives.The change to accept
Option<Output>
enables the dynamic default selection based on model type, which is the core objective of this PR. The helper function usage makes the incompatibility check cleaner.Also applies to: 58-58
68-91
: Proper conditional model creation logic.The refactored model creation logic correctly handles the case where output is dynamic (creating a default empty model) versus other cases where a real model needs to be prepared or constructed. The logic flow is clear and handles all the necessary cases.
117-129
: Excellent implementation of the core PR feature.This code segment perfectly implements the PR objective: using
llama.cpp
as the default engine for GGUF models andmistral.rs
for others. The logic:
- Correctly uses the new
is_gguf()
method on the model card- Provides appropriate logging for transparency
- Falls back gracefully when no output is specified
The user experience is improved with informative logging about the chosen default and available alternatives.
368-402
: Well-implemented conditional compilation for hardware acceleration info.The
print_cuda
function properly handles different compilation scenarios:
- Only logs for engines that actually need compile-time acceleration features (
mistralrs
,llamacpp
)- Uses appropriate conditional compilation directives for different acceleration backends
- Provides helpful guidance when no acceleration features are enabled
- Has a no-op version when no relevant engines are compiled
The implementation is both informative and non-intrusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level - no particular issue - another avenue to consider is to remove any default and force the user to provide an engine - generally speaking for a multi-engine tool I think that kind of makes sense - choosing a default for the user is convenient but at the same time a little confusing - as in why this one vs that one ....
an aside: 'out' is maybe confusing and what it we overlap options with --engine ?
so user could do either out=vllm or --engine vllm , --engine remote (for dyn:// ?)
Previously
mistral.rs
was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors,llama.cpp
becomes the default for GGUF.Why?
llama.cpp
is built-in by default, so we can switch.llama.cpp
is very very good at running GGUF (but can't run other types of model), so we should switch.Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.
We can still run GGUF with mistralrs by doing
out=mistralrs
.Summary by CodeRabbit
New Features
Refactor
Chores