-
Notifications
You must be signed in to change notification settings - Fork 580
feat: Add migration to LLM requests #1930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ModelWatcher
participant Migration
participant Backend
participant Downstream
Client->>ModelWatcher: Send generation request
ModelWatcher->>Migration: Forward request
Migration->>Backend: Forward request (with retry/migration logic)
Backend->>Downstream: Process request
Downstream-->>Backend: Stream tokens/responses
Backend-->>Migration: Stream tokens/responses
alt Error occurs (e.g., disconnection)
Migration->>Backend: Retry with accumulated tokens
end
Migration-->>ModelWatcher: Stream tokens/responses
ModelWatcher-->>Client: Stream tokens/responses
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
lib/llm/src/migration.rs (6)
78-96
: Consider making MAX_RETRIES configurable.The retry limit is hardcoded to 3, which might not be suitable for all deployment scenarios. Consider making this configurable through the
Migration
struct or environment variables.pub struct Migration { pub tokenizer: Option<Tokenizer>, + pub max_retries: u16, } impl Migration { pub async fn from_tokenizer(tokenizer: HfTokenizer) -> Result<Arc<Self>> { // ... Ok(Arc::new(Self { tokenizer: Some(tokenizer), + max_retries: 3, // default value })) } // In generate method: - const MAX_RETRIES: u16 = 3; let retry_manager = - RetryManager::build(preprocessed_request, engine_ctx.clone(), next, MAX_RETRIES) + RetryManager::build(preprocessed_request, engine_ctx.clone(), next, self.max_retries) .await?;
109-124
: Clarify the retry count semantics.The
retries_left + 1
logic in line 120 is confusing. Consider either renaming the parameter toadditional_retries
or documenting that it represents retries after the initial attempt.pub async fn build( preprocessed_request: PreprocessedRequest, engine_ctx: Arc<dyn AsyncEngineContext>, next: ServerStreamingEngine<PreprocessedRequest, Annotated<LLMEngineOutput>>, - retries_left: u16, + additional_retries: u16, ) -> Result<Self> { let mut slf = Self { request: preprocessed_request, engine_ctx, next_generate: next, next_stream: None, - retries_left: retries_left + 1, // +1 to account for the initial attempt + retries_left: additional_retries + 1, // +1 for the initial attempt };
142-143
: Address the Sync constraint TODO.The TODO comment indicates that
generate()
doesn't implementSync
. This architectural limitation should be addressed or documented properly.Would you like me to investigate why
generate()
doesn't implementSync
and suggest a solution?
161-162
: Clarify or implement context passing between retries.The TODO comments indicate uncertainty about passing context between retry attempts. This could be important for maintaining request state or metrics.
Do you need to pass any context (e.g., retry count, timing information, or request metadata) between retry attempts? I can help implement this if needed.
Also applies to: 193-194
226-234
: Document the request mutation behavior.The
track_response
method mutates the request by appending token IDs. While this is correct for retry continuation, it should be clearly documented that the request is modified during streaming.+/// Tracks responses by appending received token IDs to the request. +/// This enables retry continuation from the last received token. +/// Note: This method mutates the internal request state. fn track_response(&mut self, response: &Annotated<LLMEngineOutput>) {
273-512
: Consider refactoring MockEngine to reduce code duplication.The
MockEngine
implementation has significant code duplication across different behavior patterns, especially in the match arms. Consider extracting common patterns.The repeated pattern of creating channels, spawning tasks, and sending responses could be extracted into helper methods to make the code more maintainable.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
lib/llm/src/discovery/watcher.rs
(3 hunks)lib/llm/src/lib.rs
(1 hunks)lib/llm/src/migration.rs
(1 hunks)lib/llm/src/protocols/common/llm_backend.rs
(1 hunks)lib/runtime/src/pipeline/network/egress/push_router.rs
(1 hunks)lib/runtime/src/protocols/annotated.rs
(1 hunks)lib/runtime/src/protocols/maybe_error.rs
(2 hunks)
🧰 Additional context used
🧠 Learnings (6)
lib/runtime/src/pipeline/network/egress/push_router.rs (4)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::client::RequestErrorKind::NoResponders, not async_nats::Error::NoResponders. Use err.downcast_ref::<async_nats::client::RequestError>() and then check request_err.kind() against RequestErrorKind::NoResponders.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/discovery/watcher.rs (1)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/runtime/src/protocols/maybe_error.rs (3)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.898Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::client::RequestErrorKind::NoResponders, not async_nats::Error::NoResponders. Use err.downcast_ref::<async_nats::client::RequestError>() and then check request_err.kind() against RequestErrorKind::NoResponders.
lib/llm/src/protocols/common/llm_backend.rs (2)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.898Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/runtime/src/protocols/annotated.rs (1)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.898Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
lib/llm/src/migration.rs (2)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::error::RequestErrorKind::NoResponders. Use err.downcast_ref::<async_nats::error::RequestError>() and then check req_err.kind() against RequestErrorKind::NoResponders to handle this error properly.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::client::RequestErrorKind::NoResponders, not async_nats::Error::NoResponders. Use err.downcast_ref::<async_nats::client::RequestError>() and then check request_err.kind() against RequestErrorKind::NoResponders.
🧬 Code Graph Analysis (3)
lib/llm/src/discovery/watcher.rs (1)
lib/llm/src/migration.rs (2)
new
(298-305)from_mdc
(55-66)
lib/runtime/src/protocols/maybe_error.rs (2)
lib/llm/src/protocols/common/llm_backend.rs (2)
from_err
(139-141)err
(143-149)lib/runtime/src/protocols/annotated.rs (2)
from_err
(154-156)err
(158-169)
lib/runtime/src/protocols/annotated.rs (1)
lib/runtime/src/protocols/maybe_error.rs (4)
from_err
(20-20)from_err
(44-48)err
(23-23)err
(49-51)
🔇 Additional comments (12)
lib/llm/src/lib.rs (1)
25-25
: Module declaration correctly exposes migration functionality.The addition of the migration module to the public API is properly positioned and follows the existing naming conventions.
lib/runtime/src/pipeline/network/egress/push_router.rs (1)
199-205
: Good refactoring that simplifies error handling control flow.The inline error checking eliminates the need for a deferred variable while maintaining the same functionality. The direct approach is more readable and aligns with the migration retry logic.
lib/runtime/src/protocols/maybe_error.rs (3)
20-20
: Good addition of Send + Sync bounds for thread safety.The
Send + Sync
bounds on the error trait object ensure compatibility with async and concurrent contexts, which is essential for the migration functionality.
23-23
: Consistent error trait bounds across method signatures.The
Send + Sync
bounds on the return type maintain consistency with thefrom_err
method and ensure proper error handling in async contexts.
44-51
: Test implementation correctly updated to match new trait signatures.The test implementation properly reflects the new
Send + Sync
bounds, ensuring the changes are properly validated.lib/runtime/src/protocols/annotated.rs (2)
154-154
: Trait implementation correctly updated with Send + Sync bounds.The method signature matches the updated trait definition in
maybe_error.rs
, ensuring consistency across the codebase.
158-158
: Return type properly aligned with trait definition.The
Send + Sync
bounds on the return type maintain consistency with the trait definition and enable safe error handling in async contexts.lib/llm/src/discovery/watcher.rs (4)
22-22
: Import addition correctly exposes Migration functionality.The import is properly placed and follows the existing import structure.
208-208
: Migration operator creation follows established pattern.The migration operator is correctly created from the model deployment card using the same pattern as the backend and preprocessor operators, with proper error handling.
237-241
: Pipeline linking correctly positions migration operator.The migration operator is properly sandwiched between the backend and service_backend with both forward and backward edges linked. This positioning allows the migration operator to intercept and retry failed requests transparently.
246-286
: Remove migration suggestion: no migration operator present in pipelinesI didn’t find any use of a migration operator in either the chat or completions pipelines in
lib/llm/src/discovery/watcher.rs
. The chat path simply instantiates aPushRouter
and doesn’t include migration logic, so the original assumption is incorrect—please ignore this suggestion.Likely an incorrect or invalid review comment.
lib/llm/src/protocols/common/llm_backend.rs (1)
138-150
: LGTM! Appropriate trait bounds for async error handling.The addition of
Send + Sync
bounds to the error types is correct and necessary for thread-safe error handling in async contexts, especially when errors need to be passed between different worker threads during migration.
Could you remove the dead code or add comments explaining why it's there? |
* test 4 and 5 * shorten test 4 and 5 with retry = 0 * fix test 2 MockFailThenSuccessEngine * unify all mocks * fix test 4 and 5 error type * add test 6
…on limit left is 0
Confirmed migration works with completions endpoint:
|
commit e330d96 Author: Yan Ru Pei <yanrpei@gmail.com> Date: Fri Jul 18 13:40:54 2025 -0700 feat: enable / disable chunked prefill for mockers (#2015) Signed-off-by: Yan Ru Pei <yanrpei@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 353146e Author: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Date: Fri Jul 18 13:33:36 2025 -0700 feat: add vLLM v1 multi-modal example. Add llama4 Maverick example (#1990) Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: krishung5 <krish@nvidia.com> commit 1f07dab Author: Jacky <18255193+kthui@users.noreply.github.com> Date: Fri Jul 18 13:04:20 2025 -0700 feat: Add migration to LLM requests (#1930) commit 5f17918 Author: Tanmay Verma <tanmayv@nvidia.com> Date: Fri Jul 18 12:59:34 2025 -0700 refactor: Migrate to new UX2 for python launch (#2003) commit fc12436 Author: Graham King <grahamk@nvidia.com> Date: Fri Jul 18 14:52:57 2025 -0400 feat(frontend): router-mode settings (#2001) commit dc75cf1 Author: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com> Date: Fri Jul 18 18:47:28 2025 +0200 chore: Move NIXL repo clone to Dockerfiles (#2009) commit f6f392c Author: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Date: Thu Jul 17 18:44:17 2025 -0700 Remove link to the fix for disagg + eagle3 for TRT-LLM example (#2006) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> commit cc90ca6 Author: atchernych <atchernych@nvidia.com> Date: Thu Jul 17 18:34:40 2025 -0700 feat: Create a convenience script to uninstall Dynamo Deploy CRDs (#1933) commit 267b422 Author: Greg Clark <grclark@nvidia.com> Date: Thu Jul 17 20:44:21 2025 -0400 chore: loosed python requirement versions (#1998) Signed-off-by: Greg Clark <grclark@nvidia.com> commit b8474e5 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Thu Jul 17 16:35:05 2025 -0700 chore: update cmake and gap installation and sgl in wideep container (#1991) commit 157a3b0 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 15:38:12 2025 -0700 fix: incorrect helm upgrade command (#2000) commit 0dfca2c Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 15:33:33 2025 -0700 ci: Update trtllm gitlab triggers for new components directory and test script (#1992) commit f3fb09e Author: Kris Hung <krish@nvidia.com> Date: Thu Jul 17 14:59:59 2025 -0700 fix: Fix syntax for tokio-console (#1997) commit dacffb8 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 14:57:10 2025 -0700 fix: use non-dev golang image for operator (#1993) commit 2b29a0a Author: zaristei <zaristei@berkeley.edu> Date: Thu Jul 17 13:10:42 2025 -0700 fix: Working Arm Build Dockerfile for Vllm_v1 (#1844) commit 2430d89 Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 12:57:46 2025 -0700 test: Add trtllm kv router tests (#1988) commit 1eadc01 Author: Graham King <grahamk@nvidia.com> Date: Thu Jul 17 15:07:41 2025 -0400 feat(runtime): Support tokio-console (#1986) commit b62e633 Author: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Date: Thu Jul 17 11:16:28 2025 -0700 feat: support separate chat_template.jinja file (#1853) commit 8ae3719 Author: Hongkuan Zhou <tedzhouhk@gmail.com> Date: Thu Jul 17 11:12:35 2025 -0700 chore: add some details to dynamo deploy quickstart and fix deploy.sh (#1978) Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com> commit 08891ff Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 10:57:42 2025 -0700 fix: Update trtllm tests to use new scripts instead of dynamo serve (#1979) commit 49b7a0d Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Thu Jul 17 08:35:04 2025 -0600 feat: record + analyze logprobs (#1957) commit 6d2be14 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 00:17:58 2025 -0700 refactor: replace vllm with vllm_v1 container (#1953) Co-authored-by: alec-flowers <aflowers@nvidia.com> commit 4d2a31a Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Wed Jul 16 18:04:09 2025 -0700 chore: add port reservation to utils (#1980) commit 1e3e4a0 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Wed Jul 16 15:54:04 2025 -0700 fix: port race condition through deterministic ports (#1937) commit 4ad281f Author: Tanmay Verma <tanmayv@nvidia.com> Date: Wed Jul 16 14:33:51 2025 -0700 refactor: Move TRTLLM example to the component/backends (#1976) commit 57d24a1 Author: Misha Chornyi <99709299+mc-nv@users.noreply.github.com> Date: Wed Jul 16 14:10:24 2025 -0700 build: Removing shell configuration violations. It's bad practice to hardcod… (#1973) commit 182d3b5 Author: Graham King <grahamk@nvidia.com> Date: Wed Jul 16 16:12:40 2025 -0400 chore(bindings): Remove mistralrs / llama.cpp (#1970) commit def6eaa Author: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Date: Wed Jul 16 15:50:23 2025 -0400 feat: attributions for debian deps of sglang, trtllm, vllm runtime containers (#1971) commit f31732a Author: Yan Ru Pei <yanrpei@gmail.com> Date: Wed Jul 16 11:22:15 2025 -0700 feat: integrate mocker with dynamo-run and python cli (#1927) commit aba6099 Author: Graham King <grahamk@nvidia.com> Date: Wed Jul 16 12:26:32 2025 -0400 perf(router): Remove lock from router hot path (#1963) commit b212103 Author: Hongkuan Zhou <tedzhouhk@gmail.com> Date: Wed Jul 16 08:55:33 2025 -0700 docs: add notes in docs to deprecate local connector (#1959) commit 7b325ee Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 18:52:00 2025 -0700 fix: vllm router examples (#1942) commit a50be1a Author: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Date: Tue Jul 15 17:58:01 2025 -0700 feat: update CODEOWNERS (#1926) commit e260fdf Author: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Date: Tue Jul 15 18:49:21 2025 -0400 feat: add bitnami helm chart attribution (#1943) Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 1c03404 Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 14:26:24 2025 -0700 fix: update inference gateway deployment instructions (#1940) commit 5ca570f Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:54:03 2025 -0400 chore: Rename dynamo.ingress to dynamo.frontend (#1944) commit 7b9182f Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:33:07 2025 -0400 chore: Move examples/cli to lib/bindings/examples/cli (#1952) commit 40d40dd Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:02:19 2025 -0400 chore(multi-modal): Rename frontend.py to web.py (#1951) commit a9e0891 Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Tue Jul 15 12:30:30 2025 -0600 feat: adding http clients and recorded response stream (#1919) commit 4128d58 Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 10:30:47 2025 -0700 feat: allow helm upgrade using deploy script (#1936) commit 4da078b Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 12:57:38 2025 -0400 fix: Remove OpenSSL dependency, use Rust TLS (#1945) commit fc004d4 Author: jthomson04 <jwillthomson19@gmail.com> Date: Tue Jul 15 08:45:42 2025 -0700 fix: Fix TRT-LLM container build when using a custom pip wheel (#1825) commit 3c6fc6f Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 22:35:20 2025 -0700 chore: fix typo (#1938) commit de7fe38 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Mon Jul 14 21:47:12 2025 -0700 feat: add vllm e2e integration tests (#1935) commit 860f3f7 Author: Keiven C <213854356+keivenchang@users.noreply.github.com> Date: Mon Jul 14 21:44:19 2025 -0700 chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com> commit fc402a3 Author: Biswa Panda <biswa.panda@gmail.com> Date: Mon Jul 14 21:21:20 2025 -0700 feat: configurable namespace for vllm v1 example (#1909) commit df40d2c Author: ZichengMa <zichengma1225@gmail.com> Date: Mon Jul 14 21:11:29 2025 -0700 docs: fix typo and add mount-workspace to vllm doc (#1931) Signed-off-by: ZichengMa <zichengma1225@gmail.com> Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com> commit 901715b Author: Tanmay Verma <tanmayv@nvidia.com> Date: Mon Jul 14 20:14:51 2025 -0700 refactor: Refactor the TRTLLM examples remove dynamo SDK (#1884) commit 5bf23d5 Author: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Date: Mon Jul 14 18:29:19 2025 -0700 feat: update DynamoGraphDeployments for vllm_v1 (#1890) Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> commit 9e76590 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 17:29:56 2025 -0700 docs: organize sglang readme (#1910) commit ef59ac8 Author: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Date: Mon Jul 14 16:16:44 2025 -0700 docs: TRTLLM Example of Llama4+Eagle3 (Speculative Decoding) (#1828) Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> commit 053041e Author: Jorge António <matroid@outlook.com> Date: Tue Jul 15 00:06:38 2025 +0100 fix: resolve incorrect finish reason propagation (#1857) commit 3733f58 Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 19:04:22 2025 -0400 feat(backends): Python llama.cpp engine (#1925) commit 6a1350c Author: Tushar Sharma <tusharma@nvidia.com> Date: Mon Jul 14 14:56:36 2025 -0700 build: minor improvements to sglang dockerfile (#1917) commit e2a619b Author: Neelay Shah <neelays@nvidia.com> Date: Mon Jul 14 14:52:53 2025 -0700 fix: remove environment variable passing (#1911) Signed-off-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Neelay Shah <neelays@a4u8g-0057.ipp2u2.colossus.nvidia.com> commit 3d17a49 Author: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> Date: Mon Jul 14 14:41:56 2025 -0700 refactor: remove dynamo build (#1778) Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> commit 3e0cb07 Author: Anant Sharma <anants@nvidia.com> Date: Mon Jul 14 15:43:48 2025 -0400 fix: copy attributions and license to trtllm runtime container (#1916) commit fc36bf5 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 12:31:49 2025 -0700 feat: receive kvmetrics from sglang scheduler (#1789) Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com> commit df91fce Author: Yan Ru Pei <yanrpei@gmail.com> Date: Mon Jul 14 12:24:04 2025 -0700 feat: prefill aware routing (#1895) commit ad8ad66 Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 15:20:35 2025 -0400 feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) Remove http and llmctl binaries. They have been unused for a while. commit 480b41d Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 15:06:45 2025 -0400 feat: Python frontend / ingress node (#1912)
commit cb6de94 Author: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com> Date: Sun Jul 20 22:34:50 2025 +0200 chore: Install vLLM and WideEP kernels in vLLM runtime container (#2010) Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: alec-flowers <aflowers@nvidia.com> commit fe63c17 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Fri Jul 18 17:45:08 2025 -0700 fix: Revert "feat: add vLLM v1 multi-modal example. Add llama4 Maverick ex… (#2017) commit bf1998f Author: jthomson04 <jwillthomson19@gmail.com> Date: Fri Jul 18 17:23:50 2025 -0700 fix: Don't detokenize twice in TRT-LLM examples (#1955) commit 343a481 Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Fri Jul 18 16:22:43 2025 -0600 feat: http disconnects (#2014) commit e330d96 Author: Yan Ru Pei <yanrpei@gmail.com> Date: Fri Jul 18 13:40:54 2025 -0700 feat: enable / disable chunked prefill for mockers (#2015) Signed-off-by: Yan Ru Pei <yanrpei@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 353146e Author: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Date: Fri Jul 18 13:33:36 2025 -0700 feat: add vLLM v1 multi-modal example. Add llama4 Maverick example (#1990) Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: krishung5 <krish@nvidia.com> commit 1f07dab Author: Jacky <18255193+kthui@users.noreply.github.com> Date: Fri Jul 18 13:04:20 2025 -0700 feat: Add migration to LLM requests (#1930) commit 5f17918 Author: Tanmay Verma <tanmayv@nvidia.com> Date: Fri Jul 18 12:59:34 2025 -0700 refactor: Migrate to new UX2 for python launch (#2003) commit fc12436 Author: Graham King <grahamk@nvidia.com> Date: Fri Jul 18 14:52:57 2025 -0400 feat(frontend): router-mode settings (#2001) commit dc75cf1 Author: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com> Date: Fri Jul 18 18:47:28 2025 +0200 chore: Move NIXL repo clone to Dockerfiles (#2009) commit f6f392c Author: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Date: Thu Jul 17 18:44:17 2025 -0700 Remove link to the fix for disagg + eagle3 for TRT-LLM example (#2006) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> commit cc90ca6 Author: atchernych <atchernych@nvidia.com> Date: Thu Jul 17 18:34:40 2025 -0700 feat: Create a convenience script to uninstall Dynamo Deploy CRDs (#1933) commit 267b422 Author: Greg Clark <grclark@nvidia.com> Date: Thu Jul 17 20:44:21 2025 -0400 chore: loosed python requirement versions (#1998) Signed-off-by: Greg Clark <grclark@nvidia.com> commit b8474e5 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Thu Jul 17 16:35:05 2025 -0700 chore: update cmake and gap installation and sgl in wideep container (#1991) commit 157a3b0 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 15:38:12 2025 -0700 fix: incorrect helm upgrade command (#2000) commit 0dfca2c Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 15:33:33 2025 -0700 ci: Update trtllm gitlab triggers for new components directory and test script (#1992) commit f3fb09e Author: Kris Hung <krish@nvidia.com> Date: Thu Jul 17 14:59:59 2025 -0700 fix: Fix syntax for tokio-console (#1997) commit dacffb8 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 14:57:10 2025 -0700 fix: use non-dev golang image for operator (#1993) commit 2b29a0a Author: zaristei <zaristei@berkeley.edu> Date: Thu Jul 17 13:10:42 2025 -0700 fix: Working Arm Build Dockerfile for Vllm_v1 (#1844) commit 2430d89 Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 12:57:46 2025 -0700 test: Add trtllm kv router tests (#1988) commit 1eadc01 Author: Graham King <grahamk@nvidia.com> Date: Thu Jul 17 15:07:41 2025 -0400 feat(runtime): Support tokio-console (#1986) commit b62e633 Author: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Date: Thu Jul 17 11:16:28 2025 -0700 feat: support separate chat_template.jinja file (#1853) commit 8ae3719 Author: Hongkuan Zhou <tedzhouhk@gmail.com> Date: Thu Jul 17 11:12:35 2025 -0700 chore: add some details to dynamo deploy quickstart and fix deploy.sh (#1978) Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com> commit 08891ff Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 10:57:42 2025 -0700 fix: Update trtllm tests to use new scripts instead of dynamo serve (#1979) commit 49b7a0d Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Thu Jul 17 08:35:04 2025 -0600 feat: record + analyze logprobs (#1957) commit 6d2be14 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 00:17:58 2025 -0700 refactor: replace vllm with vllm_v1 container (#1953) Co-authored-by: alec-flowers <aflowers@nvidia.com> commit 4d2a31a Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Wed Jul 16 18:04:09 2025 -0700 chore: add port reservation to utils (#1980) commit 1e3e4a0 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Wed Jul 16 15:54:04 2025 -0700 fix: port race condition through deterministic ports (#1937) commit 4ad281f Author: Tanmay Verma <tanmayv@nvidia.com> Date: Wed Jul 16 14:33:51 2025 -0700 refactor: Move TRTLLM example to the component/backends (#1976) commit 57d24a1 Author: Misha Chornyi <99709299+mc-nv@users.noreply.github.com> Date: Wed Jul 16 14:10:24 2025 -0700 build: Removing shell configuration violations. It's bad practice to hardcod… (#1973) commit 182d3b5 Author: Graham King <grahamk@nvidia.com> Date: Wed Jul 16 16:12:40 2025 -0400 chore(bindings): Remove mistralrs / llama.cpp (#1970) commit def6eaa Author: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Date: Wed Jul 16 15:50:23 2025 -0400 feat: attributions for debian deps of sglang, trtllm, vllm runtime containers (#1971) commit f31732a Author: Yan Ru Pei <yanrpei@gmail.com> Date: Wed Jul 16 11:22:15 2025 -0700 feat: integrate mocker with dynamo-run and python cli (#1927) commit aba6099 Author: Graham King <grahamk@nvidia.com> Date: Wed Jul 16 12:26:32 2025 -0400 perf(router): Remove lock from router hot path (#1963) commit b212103 Author: Hongkuan Zhou <tedzhouhk@gmail.com> Date: Wed Jul 16 08:55:33 2025 -0700 docs: add notes in docs to deprecate local connector (#1959) commit 7b325ee Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 18:52:00 2025 -0700 fix: vllm router examples (#1942) commit a50be1a Author: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Date: Tue Jul 15 17:58:01 2025 -0700 feat: update CODEOWNERS (#1926) commit e260fdf Author: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Date: Tue Jul 15 18:49:21 2025 -0400 feat: add bitnami helm chart attribution (#1943) Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 1c03404 Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 14:26:24 2025 -0700 fix: update inference gateway deployment instructions (#1940) commit 5ca570f Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:54:03 2025 -0400 chore: Rename dynamo.ingress to dynamo.frontend (#1944) commit 7b9182f Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:33:07 2025 -0400 chore: Move examples/cli to lib/bindings/examples/cli (#1952) commit 40d40dd Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:02:19 2025 -0400 chore(multi-modal): Rename frontend.py to web.py (#1951) commit a9e0891 Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Tue Jul 15 12:30:30 2025 -0600 feat: adding http clients and recorded response stream (#1919) commit 4128d58 Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 10:30:47 2025 -0700 feat: allow helm upgrade using deploy script (#1936) commit 4da078b Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 12:57:38 2025 -0400 fix: Remove OpenSSL dependency, use Rust TLS (#1945) commit fc004d4 Author: jthomson04 <jwillthomson19@gmail.com> Date: Tue Jul 15 08:45:42 2025 -0700 fix: Fix TRT-LLM container build when using a custom pip wheel (#1825) commit 3c6fc6f Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 22:35:20 2025 -0700 chore: fix typo (#1938) commit de7fe38 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Mon Jul 14 21:47:12 2025 -0700 feat: add vllm e2e integration tests (#1935) commit 860f3f7 Author: Keiven C <213854356+keivenchang@users.noreply.github.com> Date: Mon Jul 14 21:44:19 2025 -0700 chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com> commit fc402a3 Author: Biswa Panda <biswa.panda@gmail.com> Date: Mon Jul 14 21:21:20 2025 -0700 feat: configurable namespace for vllm v1 example (#1909) commit df40d2c Author: ZichengMa <zichengma1225@gmail.com> Date: Mon Jul 14 21:11:29 2025 -0700 docs: fix typo and add mount-workspace to vllm doc (#1931) Signed-off-by: ZichengMa <zichengma1225@gmail.com> Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com> commit 901715b Author: Tanmay Verma <tanmayv@nvidia.com> Date: Mon Jul 14 20:14:51 2025 -0700 refactor: Refactor the TRTLLM examples remove dynamo SDK (#1884) commit 5bf23d5 Author: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Date: Mon Jul 14 18:29:19 2025 -0700 feat: update DynamoGraphDeployments for vllm_v1 (#1890) Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> commit 9e76590 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 17:29:56 2025 -0700 docs: organize sglang readme (#1910) commit ef59ac8 Author: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Date: Mon Jul 14 16:16:44 2025 -0700 docs: TRTLLM Example of Llama4+Eagle3 (Speculative Decoding) (#1828) Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> commit 053041e Author: Jorge António <matroid@outlook.com> Date: Tue Jul 15 00:06:38 2025 +0100 fix: resolve incorrect finish reason propagation (#1857) commit 3733f58 Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 19:04:22 2025 -0400 feat(backends): Python llama.cpp engine (#1925) commit 6a1350c Author: Tushar Sharma <tusharma@nvidia.com> Date: Mon Jul 14 14:56:36 2025 -0700 build: minor improvements to sglang dockerfile (#1917) commit e2a619b Author: Neelay Shah <neelays@nvidia.com> Date: Mon Jul 14 14:52:53 2025 -0700 fix: remove environment variable passing (#1911) Signed-off-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Neelay Shah <neelays@a4u8g-0057.ipp2u2.colossus.nvidia.com> commit 3d17a49 Author: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> Date: Mon Jul 14 14:41:56 2025 -0700 refactor: remove dynamo build (#1778) Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> commit 3e0cb07 Author: Anant Sharma <anants@nvidia.com> Date: Mon Jul 14 15:43:48 2025 -0400 fix: copy attributions and license to trtllm runtime container (#1916) commit fc36bf5 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 12:31:49 2025 -0700 feat: receive kvmetrics from sglang scheduler (#1789) Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com> commit df91fce Author: Yan Ru Pei <yanrpei@gmail.com> Date: Mon Jul 14 12:24:04 2025 -0700 feat: prefill aware routing (#1895) commit ad8ad66 Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 15:20:35 2025 -0400 feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) Remove http and llmctl binaries. They have been unused for a while. commit 480b41d Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 15:06:45 2025 -0400 feat: Python frontend / ingress node (#1912)
commit d4b5414 Author: atchernych <atchernych@nvidia.com> Date: Mon Jul 21 13:10:24 2025 -0700 fix: mypy error (#2029) commit 79337c7 Author: Ryan McCormick <rmccormick@nvidia.com> Date: Mon Jul 21 12:12:16 2025 -0700 build: support custom TRTLLM build for commits not on main branch (#2021) commit 95dd942 Author: atchernych <atchernych@nvidia.com> Date: Mon Jul 21 12:09:33 2025 -0700 docs: Post-Merge cleanup of the deploy documentation (#1922) commit cb6de94 Author: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com> Date: Sun Jul 20 22:34:50 2025 +0200 chore: Install vLLM and WideEP kernels in vLLM runtime container (#2010) Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: alec-flowers <aflowers@nvidia.com> commit fe63c17 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Fri Jul 18 17:45:08 2025 -0700 fix: Revert "feat: add vLLM v1 multi-modal example. Add llama4 Maverick ex… (#2017) commit bf1998f Author: jthomson04 <jwillthomson19@gmail.com> Date: Fri Jul 18 17:23:50 2025 -0700 fix: Don't detokenize twice in TRT-LLM examples (#1955) commit 343a481 Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Fri Jul 18 16:22:43 2025 -0600 feat: http disconnects (#2014) commit e330d96 Author: Yan Ru Pei <yanrpei@gmail.com> Date: Fri Jul 18 13:40:54 2025 -0700 feat: enable / disable chunked prefill for mockers (#2015) Signed-off-by: Yan Ru Pei <yanrpei@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 353146e Author: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Date: Fri Jul 18 13:33:36 2025 -0700 feat: add vLLM v1 multi-modal example. Add llama4 Maverick example (#1990) Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: krishung5 <krish@nvidia.com> commit 1f07dab Author: Jacky <18255193+kthui@users.noreply.github.com> Date: Fri Jul 18 13:04:20 2025 -0700 feat: Add migration to LLM requests (#1930) commit 5f17918 Author: Tanmay Verma <tanmayv@nvidia.com> Date: Fri Jul 18 12:59:34 2025 -0700 refactor: Migrate to new UX2 for python launch (#2003) commit fc12436 Author: Graham King <grahamk@nvidia.com> Date: Fri Jul 18 14:52:57 2025 -0400 feat(frontend): router-mode settings (#2001) commit dc75cf1 Author: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com> Date: Fri Jul 18 18:47:28 2025 +0200 chore: Move NIXL repo clone to Dockerfiles (#2009) commit f6f392c Author: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Date: Thu Jul 17 18:44:17 2025 -0700 Remove link to the fix for disagg + eagle3 for TRT-LLM example (#2006) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> commit cc90ca6 Author: atchernych <atchernych@nvidia.com> Date: Thu Jul 17 18:34:40 2025 -0700 feat: Create a convenience script to uninstall Dynamo Deploy CRDs (#1933) commit 267b422 Author: Greg Clark <grclark@nvidia.com> Date: Thu Jul 17 20:44:21 2025 -0400 chore: loosed python requirement versions (#1998) Signed-off-by: Greg Clark <grclark@nvidia.com> commit b8474e5 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Thu Jul 17 16:35:05 2025 -0700 chore: update cmake and gap installation and sgl in wideep container (#1991) commit 157a3b0 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 15:38:12 2025 -0700 fix: incorrect helm upgrade command (#2000) commit 0dfca2c Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 15:33:33 2025 -0700 ci: Update trtllm gitlab triggers for new components directory and test script (#1992) commit f3fb09e Author: Kris Hung <krish@nvidia.com> Date: Thu Jul 17 14:59:59 2025 -0700 fix: Fix syntax for tokio-console (#1997) commit dacffb8 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 14:57:10 2025 -0700 fix: use non-dev golang image for operator (#1993) commit 2b29a0a Author: zaristei <zaristei@berkeley.edu> Date: Thu Jul 17 13:10:42 2025 -0700 fix: Working Arm Build Dockerfile for Vllm_v1 (#1844) commit 2430d89 Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 12:57:46 2025 -0700 test: Add trtllm kv router tests (#1988) commit 1eadc01 Author: Graham King <grahamk@nvidia.com> Date: Thu Jul 17 15:07:41 2025 -0400 feat(runtime): Support tokio-console (#1986) commit b62e633 Author: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Date: Thu Jul 17 11:16:28 2025 -0700 feat: support separate chat_template.jinja file (#1853) commit 8ae3719 Author: Hongkuan Zhou <tedzhouhk@gmail.com> Date: Thu Jul 17 11:12:35 2025 -0700 chore: add some details to dynamo deploy quickstart and fix deploy.sh (#1978) Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com> commit 08891ff Author: Ryan McCormick <rmccormick@nvidia.com> Date: Thu Jul 17 10:57:42 2025 -0700 fix: Update trtllm tests to use new scripts instead of dynamo serve (#1979) commit 49b7a0d Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Thu Jul 17 08:35:04 2025 -0600 feat: record + analyze logprobs (#1957) commit 6d2be14 Author: Biswa Panda <biswa.panda@gmail.com> Date: Thu Jul 17 00:17:58 2025 -0700 refactor: replace vllm with vllm_v1 container (#1953) Co-authored-by: alec-flowers <aflowers@nvidia.com> commit 4d2a31a Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Wed Jul 16 18:04:09 2025 -0700 chore: add port reservation to utils (#1980) commit 1e3e4a0 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Wed Jul 16 15:54:04 2025 -0700 fix: port race condition through deterministic ports (#1937) commit 4ad281f Author: Tanmay Verma <tanmayv@nvidia.com> Date: Wed Jul 16 14:33:51 2025 -0700 refactor: Move TRTLLM example to the component/backends (#1976) commit 57d24a1 Author: Misha Chornyi <99709299+mc-nv@users.noreply.github.com> Date: Wed Jul 16 14:10:24 2025 -0700 build: Removing shell configuration violations. It's bad practice to hardcod… (#1973) commit 182d3b5 Author: Graham King <grahamk@nvidia.com> Date: Wed Jul 16 16:12:40 2025 -0400 chore(bindings): Remove mistralrs / llama.cpp (#1970) commit def6eaa Author: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Date: Wed Jul 16 15:50:23 2025 -0400 feat: attributions for debian deps of sglang, trtllm, vllm runtime containers (#1971) commit f31732a Author: Yan Ru Pei <yanrpei@gmail.com> Date: Wed Jul 16 11:22:15 2025 -0700 feat: integrate mocker with dynamo-run and python cli (#1927) commit aba6099 Author: Graham King <grahamk@nvidia.com> Date: Wed Jul 16 12:26:32 2025 -0400 perf(router): Remove lock from router hot path (#1963) commit b212103 Author: Hongkuan Zhou <tedzhouhk@gmail.com> Date: Wed Jul 16 08:55:33 2025 -0700 docs: add notes in docs to deprecate local connector (#1959) commit 7b325ee Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 18:52:00 2025 -0700 fix: vllm router examples (#1942) commit a50be1a Author: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Date: Tue Jul 15 17:58:01 2025 -0700 feat: update CODEOWNERS (#1926) commit e260fdf Author: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Date: Tue Jul 15 18:49:21 2025 -0400 feat: add bitnami helm chart attribution (#1943) Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> commit 1c03404 Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 14:26:24 2025 -0700 fix: update inference gateway deployment instructions (#1940) commit 5ca570f Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:54:03 2025 -0400 chore: Rename dynamo.ingress to dynamo.frontend (#1944) commit 7b9182f Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:33:07 2025 -0400 chore: Move examples/cli to lib/bindings/examples/cli (#1952) commit 40d40dd Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 16:02:19 2025 -0400 chore(multi-modal): Rename frontend.py to web.py (#1951) commit a9e0891 Author: Ryan Olson <ryanolson@users.noreply.github.com> Date: Tue Jul 15 12:30:30 2025 -0600 feat: adding http clients and recorded response stream (#1919) commit 4128d58 Author: Biswa Panda <biswa.panda@gmail.com> Date: Tue Jul 15 10:30:47 2025 -0700 feat: allow helm upgrade using deploy script (#1936) commit 4da078b Author: Graham King <grahamk@nvidia.com> Date: Tue Jul 15 12:57:38 2025 -0400 fix: Remove OpenSSL dependency, use Rust TLS (#1945) commit fc004d4 Author: jthomson04 <jwillthomson19@gmail.com> Date: Tue Jul 15 08:45:42 2025 -0700 fix: Fix TRT-LLM container build when using a custom pip wheel (#1825) commit 3c6fc6f Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 22:35:20 2025 -0700 chore: fix typo (#1938) commit de7fe38 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Mon Jul 14 21:47:12 2025 -0700 feat: add vllm e2e integration tests (#1935) commit 860f3f7 Author: Keiven C <213854356+keivenchang@users.noreply.github.com> Date: Mon Jul 14 21:44:19 2025 -0700 chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com> commit fc402a3 Author: Biswa Panda <biswa.panda@gmail.com> Date: Mon Jul 14 21:21:20 2025 -0700 feat: configurable namespace for vllm v1 example (#1909) commit df40d2c Author: ZichengMa <zichengma1225@gmail.com> Date: Mon Jul 14 21:11:29 2025 -0700 docs: fix typo and add mount-workspace to vllm doc (#1931) Signed-off-by: ZichengMa <zichengma1225@gmail.com> Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com> commit 901715b Author: Tanmay Verma <tanmayv@nvidia.com> Date: Mon Jul 14 20:14:51 2025 -0700 refactor: Refactor the TRTLLM examples remove dynamo SDK (#1884) commit 5bf23d5 Author: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Date: Mon Jul 14 18:29:19 2025 -0700 feat: update DynamoGraphDeployments for vllm_v1 (#1890) Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> commit 9e76590 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 17:29:56 2025 -0700 docs: organize sglang readme (#1910) commit ef59ac8 Author: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Date: Mon Jul 14 16:16:44 2025 -0700 docs: TRTLLM Example of Llama4+Eagle3 (Speculative Decoding) (#1828) Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> commit 053041e Author: Jorge António <matroid@outlook.com> Date: Tue Jul 15 00:06:38 2025 +0100 fix: resolve incorrect finish reason propagation (#1857) commit 3733f58 Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 19:04:22 2025 -0400 feat(backends): Python llama.cpp engine (#1925) commit 6a1350c Author: Tushar Sharma <tusharma@nvidia.com> Date: Mon Jul 14 14:56:36 2025 -0700 build: minor improvements to sglang dockerfile (#1917) commit e2a619b Author: Neelay Shah <neelays@nvidia.com> Date: Mon Jul 14 14:52:53 2025 -0700 fix: remove environment variable passing (#1911) Signed-off-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Neelay Shah <neelays@a4u8g-0057.ipp2u2.colossus.nvidia.com> commit 3d17a49 Author: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> Date: Mon Jul 14 14:41:56 2025 -0700 refactor: remove dynamo build (#1778) Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> commit 3e0cb07 Author: Anant Sharma <anants@nvidia.com> Date: Mon Jul 14 15:43:48 2025 -0400 fix: copy attributions and license to trtllm runtime container (#1916) commit fc36bf5 Author: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Date: Mon Jul 14 12:31:49 2025 -0700 feat: receive kvmetrics from sglang scheduler (#1789) Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com> commit df91fce Author: Yan Ru Pei <yanrpei@gmail.com> Date: Mon Jul 14 12:24:04 2025 -0700 feat: prefill aware routing (#1895) commit ad8ad66 Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 15:20:35 2025 -0400 feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) Remove http and llmctl binaries. They have been unused for a while. commit 480b41d Author: Graham King <grahamk@nvidia.com> Date: Mon Jul 14 15:06:45 2025 -0400 feat: Python frontend / ingress node (#1912)
Overview:
If a request failed due to connection issue to the worker, i.e. network issue or worker failed, this feature will try to restart the request on another worker, until the retry limit is reached.
Details:
MaybeError
trait for processing errors in a future.Outputs:
There is a http endpoint and two vLLM engines started with the following commands:
Case 1: A vLLM engine is stopped before a request is routed to it.
Confirmed from the http endpoint that the request is rerouted to the other worker before it was started:
Case 2: The vLLM engine is stopped while streaming responses.
Confirmed from the http endpoint that the stream ended prematurely and the request continued at the other worker:
In both case, there is no observable difference in the OpenAI response than a request completed normally, i.e.
Where should the reviewer start?
Start with the 6 unit tests, which covers a failure scenario each. Then, look into the migration implementation.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores