feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment #1366

hhzhang16 · 2025-06-03T23:33:24Z

Overview:

Prompt templates are set in the config YAML files
Added instructions for deploying the multimodal example to Kubernetes.

Details:

Tested deployment for both agg and disagg in Kubernetes, verified they worked

Curl request output for agg:

There is a white, black, and yellow city bus in the image, parked on the side of a wet rural road. It is located near a street light, and it is not currently in service. The city bus appears to be on the outskirts of the city and is likely waiting to return to service.

Curl request output for disagg:

In the image, there is a bus driving down the street, where a house can be seen nearby, and there is an outdoor stop sign by the street. Additionally, there appears to be a building with a traffic light, an electric pole, and a street sign.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added support for customizable prompt templates in multimodal and LLM examples via configuration and command-line options.
Documentation
- Updated deployment and testing instructions for LLM and multimodal examples, including guidance for Kubernetes deployments and ingress usage.
- Clarified and expanded sections in example READMEs for improved usability.
Style
- Improved comments in multimodal components for clarity and future flexibility.
Refactor
- Enhanced model loading in multimodal decode worker for better device and precision handling.
Chores
- Updated configuration files to support new prompt template options and standardized GPU resource specifications.

…-set-model-specific-prompt-templates-in-the-config-file

coderabbitai · 2025-06-04T13:59:40Z

## Walkthrough

This update introduces prompt template configurability to multimodal and LLM examples, allowing prompt formatting via YAML configuration or CLI. Documentation is expanded for Kubernetes deployment with Dynamo Operator, including ingress testing. Minor code changes ensure prompt templates are applied during request processing, and GPU resource specifications are standardized in configuration files.

## Changes

| File(s)                                                                                  | Change Summary                                                                                           |
|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| examples/llm/README.md                                                                   | Clarified deployment testing steps, adding ingress-based direct API call instructions.                   |
| examples/llm/configs/disagg.yaml                                                         | Added `prompt-template` entry to `Processor` config.                                                     |
| examples/multimodal/README.md                                                            | Renamed sections, added detailed Kubernetes deployment instructions, and expanded testing guidance.       |
| examples/multimodal/components/decode_worker.py                                          | Set model loading to use `device_map="auto"`, `torch_dtype=bfloat16`, and `.eval()` for remote prefill.  |
| examples/multimodal/components/prefill_worker.py                                         | Simplified and clarified comments regarding dummy token insertion; added a TODO for flexibility.          |
| examples/multimodal/components/processor.py                                              | Modified prompt construction to use a configurable template from `engine_args.prompt_template`.           |
| examples/multimodal/configs/agg.yaml                                                     | Added `prompt-template` to `Processor`; changed GPU resource values from int to string.                   |
| examples/multimodal/configs/disagg.yaml                                                  | Added `prompt-template` to `Processor`; changed GPU resource values from int to string.                   |
| examples/multimodal/utils/vllm.py                                                        | Added `--prompt-template` CLI argument and corresponding attribute in engine args.                        |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant User
    participant Processor
    participant Model

    User->>Processor: Send request with messages and image
    Processor->>Processor: Apply prompt template (replace <prompt>)
    Processor->>Model: Forward formatted prompt and image
    Model-->>Processor: Generate response
    Processor-->>User: Return response

Possibly related PRs

feat: Support OAI frontend format and add async image handing #1214: Updates prompt template usage and deployment documentation for multimodal examples, related to changes in request structure and prompt formatting.
docs: Update Multimodal Example README #1275: Modifies multimodal README sections and worker communication details, overlapping documentation scope with this PR.

Poem

A prompt template, shiny and new,
Guides the models in what to do.
Docs now show the Kubernetes way,
With ingress paths to save your day.
GPUs as strings, not just a number—
This rabbit codes with cheerful thunder!
🐇✨


<!-- walkthrough_end -->


---

<details>
<summary>📜 Recent review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 3f82cd2cb6d38f6b629e53c536d04747db6e9704 and 555d447ca4c41b1c5c81ec5d00c0dfce500284db.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `examples/multimodal/README.md` (3 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>examples/multimodal/README.md</summary>

[grammar] ~37-~37: The verb ‘encode’ does not usually follow articles like ‘the’. Check that ‘encode’ is spelled correctly; using ‘encode’ as a noun may be non-standard.
Context: ..../llm/README.md) example. By separating the encode from the prefill and decode stages, we ...

(A_INFINITIVE)

---

[grammar] ~38-~38: The usual collocation for “independently” is “of”, not “from”. Did you mean “independently of”?
Context: ... deployment and scale the encode worker independently from the prefill and decode workers if neede...

(INDEPENDENTLY_FROM_OF)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (2)</summary>

* GitHub Check: Mirror Repository to GitLab
* GitHub Check: Build and Test - vllm

</details>

<details>
<summary>🔇 Additional comments (5)</summary><blockquote>

<details>
<summary>examples/multimodal/README.md (5)</summary>

`31-40`: **Rename "Deployment" sections to "Graph" in aggregated example**

The heading and explanatory text now consistently refer to "Graph" instead of "Deployment," which better reflects the conceptual flow of worker interactions.

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[grammar] ~37-~37: The verb ‘encode’ does not usually follow articles like ‘the’. Check that ‘encode’ is spelled correctly; using ‘encode’ as a noun may be non-standard.
Context: ..../llm/README.md) example. By separating the encode from the prefill and decode stages, we ...

(A_INFINITIVE)

---

[grammar] ~38-~38: The usual collocation for “independently” is “of”, not “from”. Did you mean “independently of”?
Context: ... deployment and scale the encode worker independently from the prefill and decode workers if neede...

(INDEPENDENTLY_FROM_OF)

</details>

</details>

---

`52-52`: **Specify code block language for Bash**

Adding the `bash` annotation to the code fence ensures shell syntax highlighting and improves readability.

---

`92-100`: **Rename "Deployment" sections to "Graph" in disaggregated example**

The disaggregated serving section now uses "Graph" consistently, matching the terminology in the aggregated example and unifying the documentation.

---

`162-165`: **Add "Deployment with Dynamo Operator" section**

The new section clearly outlines how to deploy the multimodal examples on Kubernetes using Dynamo Cloud and the CLI, with correct links to prerequisites and operator docs.

---

`174-202`: **Provide detailed deployment steps**

The bash commands for setting environment variables, building images, and deploying to Kubernetes are comprehensive. Paths to configs and project root are accurate, and the placeholders for image tags and namespaces are clearly documented.

</details>

</blockquote></details>

</details>
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNwSPbABsvkCiQBHbGlcSHFcLzpIACIAMxJqLkQSUOZFEl9EblF4WPgGHgoWblCaZm4vamkULFxYD2ZvcTTaNF8mDDyiSDzIxAAadFp6WnwGR3ZqeHwsWPw+Rq9mxTbIEgAPNHLIyCUK/Fk2DFxoyDlIbGS+WFgAL1hMIgBGADYa3CLabAZq3AB3fCQZjaLBkB4YH5HXCIDSQABi8AoiFwg3UawwKj69hS4gw3V4xVKJG2VWQtERogi8j+6gQtXqCk68CI2Co4hmkAAmgBBACyABkevAsXMFk14C1Vi0MjDIABlUQzWio0JoYZklLaSL0eAYZEUb7svU9ea7Ej7WS67p1BriyW+DZbCoeDkAaWwAko5BoMLcDIY4NIyAeUjOJDIYRCUVFZ3wdXQRCI/iIVXoAAo0ImAJToDAjJCZpMkFM0dPkxCFnN7LwHKFA9LIN0er0paSDGnxhwMH6IRCxbyQKQUXL5KYc/CxML1RE9bAQo1tdSyWEAUU22xdeG4eGQsSKzEYrN8/iCIQ1aT1HyqU4a6S8AHJkCphUuwoClIgGMPPShgUGarG8bVrW7D1h+gwIEQsBeMysC4taDIfl+8AlJIHgMGg3DPjB4jVBON5AnaKyZLIyLEhoRgAHKAlkOR5AUvTVDG/gSPAJB/JAfyUOhbTavweCDJg9AYIC/iVKWkAAOLqAAEh6KC9sEyBcf4kAwRgADWUS6gRSHDqhMywu41h2HM4zJI2WASgSrF4oyXSsmOsyROs8BqDh8hCWaFp1qM4xQk5Jp8DahFLBKxFrOuzqcbSOkhQSIhiBRBj6MY4BQGQ9D4WgeCEKQ5BslETCsOwXC8PwwiUmhMjyEwShUKo6haDoqUmFAcCoKgmA4AQxBkMoEnFVCXBUBxDhOC4Zy1ekDVqJo2i6GAhhpaYBiOhuiAAPQ+Mwm02Cu3IACK8iuGjMLQHAGNE10GBYkDcgAkn1BWpvYjjApN+EBo80h+h4fkTMcgVxQy0TQCEVoEYd5o1oc7CnMkYjTFgfxoMg2DcK0EkEIwlTDrE8h1NQKCTla/i9gp6ASFqmIeDGIXAXDxyDJclDIBhWDkv4YheLVvEEQAqjYgr+LgrLkPQ5whUNXkAAa0LIGJpN5sN1qQoQACQAN6HSuVj8gA8pyJ2UdAAD6lF8iuAC+stGQgT4+PgfzIDaB444gGkoQRvAkKx+CXLzfiBNgFL0GR3CNpOeR5pDIV7jMNB5jwii5vQoqoxQ5J2eoyDcPMpSAjWGFeLA+DImGooeD68EEdyVgPeitD57qmgwP6gYeGqtDICy8CtBCLpYGg9aqVzlKRsikNsHUqd/PUVl4uTyCoB0Dn+LQyXmJY3JLANyOu4CDOiHjTlR5F+cUBJprbgIMEFOw6hsYgVEzCQV03Slq3rc6W07Zta9mRbXLIWDQsgtheEutdaIt0d5PXygNKI40PryC+p3F+BhuSQHIBxQBLI2TI3RB8eQssCTlFwGAMoFQqiyxivGEK1MvDBEgLLaIAs5QrhsFwYAEpMwkD0AAHQwMAMhJQ9D3TlHKB6cpoDclNhwaItDUZPmGFEOc9UCKyysEUHsiB5i0MRkae2XVhhPw5K3T43xqgj1EUSEkNB7AfDjg8UIMEtJByUDHX4DJRTAlCDGVmfAvJo0QEgXAmBQit2UEjcx9IPAEl0ZDbgKEMi6hILCai/AbR8G+niX4Rd8BEHyIMSgRRrhCXUkQQYpoOgfHwL4WINYOIqQaGqdJRht73T3gQmYh9dIn2cGffgk4NiX2vnwW+98iFP1+ilSAmTcn/hxqMguUQb4eimY/XA8glAMFPkaZS3EgRtIotAowEAwBGB/n0TaixlitC8HtA6x1TrnSgTdO6j1nqIPDu9ZwqDJyLNmcZAGAUjScTRhcTGr0cb+CVtXUG0MfLw2xDE406jKCAVgARO54UHkJiLCWKIXkQGJmTK9K4tlujXPyTESSVBuCwH6KcPZzhciWjssfT8+lwX4RCgCCgWk+BRKoGi3Y8BMxUGYLKYyNAKAHmiIzKEpxlHB2oT8egHZsXRCTJhWApwcaelwHK4OjSJ7SxmD8Eo2BViNOdn4ag9RgrghViBY49hsgMBHAwWU3JTELh8LIQS2D2IKHKP4BeoTQyGMIeEPi0QkWq1AlqyAh1FZbEBAbbIbJ5gqshd3OgxjkAxo5DZfu1QlDhOFNpS8Bo0W7lNIzZxtowr2kik6LEOMR7uk9BQb01Q9mXBNZcSGaalaAgAMI1mwPQLyIUx0ZsgBO/kD1YQPUiRCJhH5CjcSCEgdQ1QuzYshbqZEvFR3puVlOgOs7k4rNCZyhk8tL2EGRPMEgBjKCsR+CUjArEigYDrNTYctNsRi24IMAQocvC3pGDDN1oQZZ5gbXwAQcZj1kuLK9ElBZMNEvDl+6eDZBLJxEj6fgWBwS0EqZAOSiYrRwjQD8N8WksCI1Fk+O9GI77Np4JUDABUw353IMce2HgS3Dy8Hos03K3LVDLhxHGNd+nIvdecMZYAM7OGzghOmAGk70HzllPgI6c5L2kAMOh2KaWHgoO0FgwJkNvkHJQdldcG5NxbqJyANgSDxDhT2ZzGw5UYl8PgbN1BG3waZqEPu27nAeF1IOpQ6dTT9ivk6s0VapMUddTFwY7GUjHIxKQKEJH6B1I2dIMucYrRb1gV0uVQylOIQGT040+EVlXzWRMjZ+Rpm4QwVABZ6CgvrDGT1zwPGH7HBmYgAA3BRjxYxAbhPBRjLG1QWnHKUKcz+FyrlRRubi+0ADihv2OMAxUSgzYCqFRobgsh3kwM+fA/qhVfkTQBYwdBf0fs/TAiObxHgaxqkhnyhk/JKjUzhPMCdSozGhcku9wK0pfAgy7ogRWDAza6jRDPMumqF7BzSA432vR0fIDILTTe1gWLTEDkGnFd4IXIFByl+w1kg7nBLjR2WCdmBm19leNJtAlG0mZ0oXwWE6gZOdoMEKaO1L4DB2Z5L1j/XI1WFhKVrYkRcHln7fIJAzbAm4AAXmiDlAgijnPW5YFMXnnlezMiwErnX0JnNKG/W2NOLCCAUADGbWg2zsjm4DwGDQAg7XUFeLQnGrMzgx9wK8HdXrQmGXbiDlXHOleoBtFZEq5IqhB2SIXNYjCbXgulFNRg567J2z9m0NMWY7bzMBOhzFNYikFBqYnIoDSmmUzaHKwtHSGu7yawc5zx9WXtfPl18ZU3NmzcG+c9v/28l9Nn/sg+wyL6rOM8v/rWzn57ZgV/S5a0jvSFuURB5Z3ygXehJtcnwovC3fmPdx7z2J9vZehJMgv8vvkChgsZENOwBqLJmoI+olnqJQOCl5P4GkIwvvuqiQGXDBpil8KwITPgKxsgGJDCkfAyMSJ6KYnZKErcHAT7PuCUIyPqIaIQttvTskLTsZPMMyLqKsBAZdhfJUKLlOMTDgcwHgQQZxEcqeggVEAIGjGsnEmsMwOQTppztQX7pWlqFEPFHQaEIwWIKyCQKiJujOrxuQFoaJMSPgKGCIWIWQE+LECaiFLwqQCxmQGJlCptvQLwZ7qEtsG5nUKgKMvxk5AVmtuDhgEHETKEOgZgRojYa4caAljUFcBJLIWwbliFGQXQCoVQV3MnN3OgDAAbIdAbNgnGNXICMCFpFOKgJ6CGNMAsO+j0C5G5DsF5GjmAHsJlOwHLpvv+NtsCEoDPgyExmLKsN3v1n3scAPs0fahDuhOkPVp8t0s1iQf9G1kMp1uNofvwL1tNgNs/K/OQOfuct/DfltCdsRI/sJpAa/johZvMA9k9h/C9nAt8h9m9F9iAb9gYMZLLAgoVLQplF5qkCkITqzoDnkOYQwR8IaARIEkCBZnwoyEnKEOcJhBUByt0DYjoZGPYnpiwCwskF4LEBoGQEUuQGbM4EQDCLYmbFQuJB+u4TEfUhorLCIjoXoLQiDPnsSNQg4qgP4OgZqhLvHIiBXDQOsIhonKBALgREliwJDCeEpLgI+Iib2HwrKNokbgHIgLzArv6DKe6qqiPA8FnHVEgk4nZB0BhEnFMA3mwhwlwpADwn+PwkIgAIoqnIwKLi70IMjKkhCRhSl+6sL3R6gqS+ly5/AGkLHHCgSoBl7ircxUjDHxL3G9hRC2KOLDh4juHkyhDzGImzzKg1DJaQxfgJC8qTghSywTouLw4bhGi+anjIi0JeTBairVnylumQBCz8iDDIEgjowQidybzj7LFT574tbrFz6bEjLbHdZH6TIn6r6HFzIjY/Tb4MihJEAYhizky7FmhznT5bETbLl9YzbhDPwSGqSDHpJZ6QmjjgqdTxl9qrAzksIAk0LFnglc7EiTC8oRHLgvGnFX40oXH35tBnZMg0mbSgLgLMCQKgWvbvEUp/KfSAo/HgEzDryBSMQQkbbEFDCzohocRaQkK2KUJ8mMm0IYrOoeBaIZl6IUCfpiopoMJtDMKsLsKcLcLOHunCK2LiLciSLSKyLyKKKwh+rkgBr6kESSRWACzBx6KsjMaV7VD0UsIABqO00MFpAA6l/pQBoAqBQD7tyBQDSRoOTAHIHtIBoEQNuO2cnLLLpV4MwCuBCOkEZYKiZWZRZVZTCLZWpQ5U5dgEokckCunPuO8MWJirLE8PHjRFad0LLPeE8PeG3pkp3jkqNjjBMb3jkv3vUrMc0kcveUsTvCsdPp+bsrvr0vvovpNiuVeXNuvpuVvmNueUea1QNjshsdPgMW0gBAEezLhcyI5OCoxCcZfodh2rfpcQ/ngsArhkQGAhAr/qhd+YARhd9qAX9nglNYQgRaqkRdjICAUSPDgpABRSwlRQyT+RxQyBpSwo6XxS6QJYIkJZyRIlIjInItAL6RcHmJirWdomMA8axaikYvdJrjMLxEziFIpcpSFfZR6nRM+XvlpbLAFcbpZTSbQjGG5XpddiQL5UKrLIMCTR5dqRThTZQFTaGe5Z5d5UoAzdDdttFT0LFd1FEqQHwIlclYUfqJDOlZldlR3tkn0bSoVUebUjMXauVXeSchOdVVOY1XVYNXvmeTsesvsafrMtRMcaBXNdfgtZBa2lcXgMKFtBIDtE8VtW8TtUgntd8VuUYFgjdUhrQGAOpF3FZatiwhAA9TRT+XyrINkDmZDCmiPJ4jlEsIOFxYxdEByYSHoLbsEjJshKhKGPUF4NwOqRWC4SaaohVmsegIHXWDrlcKNU+jXSbvbR5VSUFUTXOGirCPDhQOTMJjpvJfFM4Oka9RDl1FXQmU+C7nudCddaGtQE4lBg4qQjofSWHTQLQhyCFKLOLFELLNyFjhCF5RSSQATYgNyZeJgD8L0blcrj3vLSVYPvaqqi0EDuOR0pOfvJrRXfVYMqeQuT1frSvtebMlADJcSmRUuiupXSyHWLLCHTodRfibQmmKHoxfqEzfHU0G9WneQhnbLDmBjrOPOIQqQoPY3TtC3YTQBLLBBXflbQ/jbVJptE3cwE8XbHMqA6RTdXPd+HgJjruYBqBNQ3iGkhQ7ScvY9WvXXR4FvX2knUwi6JOCQ0iGQ83dSafVQzQ0tdBQw3bQ7Y9mw2cl/MtG1E3PvvbmhYNA5qVA6mNG7TzjNCoHNM1ItIYCY0NOoLjj3GbPTuxHQGbGelfC4ytFAAAMyxAAAcAATAwLQNEwIC8LQCE+E7EC8PE5EwAJwkAACsITDAOTCTAADAACwADspTtA8TJA6TJTxTQTJjWTDTtARTpTGERTDARTTwAgTweTDA4TTwogWTtABTBTDABTtAsQPwWTwzkT4TRTFTQTBgbjDmHj/ciA3jRuvjtAZsmULjQAA= -->

<!-- internal state end -->
<!-- tips_start -->

---



<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=ai-dynamo/dynamo&utm_content=1366):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

examples/multimodal/components/prefill_worker.py (1)
249-250: Address the line length and consider implementing the TODO.

The comment simplification is good, but there are a couple of issues to address:

Line 249 exceeds the 100-character limit (102 characters)

The TODO suggests making dummy token handling more flexible and model-dependent

Apply this diff to fix the line length:
-            # some placeholder dummy tokens are inserted based on the embedding size in the worker.py.
+            # some placeholder dummy tokens are inserted based on the embedding size in worker.py.
Would you like me to help implement a more flexible, model-dependent approach for the dummy token handling?

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 249-249: Line too long (102/100)

(C0301)

[warning] 250-250: TODO: make this more flexible/model-dependent

(W0511)
examples/multimodal/README.md (2)
100-102: Inconsistent section heading terminology
The disaggregated example now uses “### Local Serving” while the aggregated example retains “### Deployment”. This mismatch could confuse readers. Consider renaming for parity—e.g.:
-### Local Serving
+### Local Deployment
or
-### Local Serving
+### Local Serving (Disaggregated)
162-239: New Operator deployment section looks solid; minor path correction
The “Deployment with Dynamo Operator” segment is comprehensive and aligns well with the PR objectives. One small nitpick: the commented disaggregated deploy command omits ./ before configs/disagg.yaml, unlike the aggregated example. For consistency, update as follows:
-# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f configs/disagg.yaml
+# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/disagg.yaml

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25c711f and 3f82cd2.

📒 Files selected for processing (9)

examples/llm/README.md (1 hunks)
examples/llm/configs/disagg.yaml (1 hunks)
examples/multimodal/README.md (2 hunks)
examples/multimodal/components/decode_worker.py (1 hunks)
examples/multimodal/components/prefill_worker.py (1 hunks)
examples/multimodal/components/processor.py (1 hunks)
examples/multimodal/configs/agg.yaml (3 hunks)
examples/multimodal/configs/disagg.yaml (3 hunks)
examples/multimodal/utils/vllm.py (2 hunks)

🧰 Additional context used

🪛 Pylint (3.3.7)

examples/multimodal/components/prefill_worker.py

[convention] 249-249: Line too long (102/100)

(C0301)

[warning] 250-250: TODO: make this more flexible/model-dependent

(W0511)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Test - vllm

🔇 Additional comments (8)

examples/multimodal/utils/vllm.py (2)

54-59: LGTM! Clean implementation of prompt template CLI argument.

The argument follows the established pattern and provides a sensible default value.

68-68: LGTM! Proper assignment of parsed argument.

The prompt template is correctly assigned to the engine args object, maintaining consistency with other custom arguments.
examples/multimodal/configs/disagg.yaml (2)

23-23: LGTM! Appropriate prompt template for multimodal applications.

The template format correctly includes both image and text placeholders, suitable for multimodal processing.

34-34:
❓ Verification inconclusive

Verify GPU resource specification type change.

The GPU resource specifications were changed from integer 1 to string '1'. Please ensure this change is intentional and that the configuration parser correctly handles string values for GPU resources.

Also applies to: 42-42, 51-51

🏁 Script executed:
#!/bin/bash
# Check if other config files use string or integer values for GPU resources
echo "Checking GPU resource specifications across config files:"
rg -A 2 -B 2 "gpu:" examples/
Length of output: 13853
Verify parser handles mixed GPU spec types and standardize configs
A search across the examples/ directory shows that GPU counts are inconsistently quoted: many newer configs use string values (gpu: '1', '8', '16'), while some still use unquoted integers (gpu: 1, gpu: 8, etc.). Please ensure your YAML‐loading logic correctly casts string values to integers for GPU resources, and consider normalizing all example configs to a single format.

Files still using integer GPU specs:

examples/sglang/configs/disagg.yaml (gpu: 1)

examples/tensorrt_llm/configs/agg.yaml (gpu: 1)

examples/vllm_v0/configs/disagg.yaml (gpu: 1)

Either update these to quoted strings or confirm the parser accepts both types without issue.
examples/llm/README.md (1)

228-229: LGTM! Helpful documentation improvement.

The clarification about using ingress URLs directly improves the user experience by providing a more streamlined testing approach when ingress is available.

examples/multimodal/configs/agg.yaml (2)

22-22: LGTM! Configurable prompt template implementation.

The addition of the prompt-template configuration enables flexible prompt formatting for multimodal interactions. The template format properly includes placeholders for image and prompt content.

34-34: GPU resource format standardization.

The change from integer 1 to string '1' for GPU resources is likely required for proper YAML parsing or Kubernetes resource specifications. This standardization improves consistency across configuration files.

Also applies to: 43-43

examples/multimodal/components/decode_worker.py (1)

137-141: LGTM! Improved model loading configuration.

The updates enhance model loading with:

device_map="auto" for optimal device allocation

torch_dtype=torch.bfloat16 for efficient precision

.eval() for proper inference mode

These changes improve performance and resource utilization in remote prefill mode.

examples/llm/configs/disagg.yaml

examples/multimodal/components/processor.py

whoisj

LGTM

examples/multimodal/components/decode_worker.py

krishung5

LGTM. Thanks for the changes!

hhzhang16 added 6 commits May 30, 2025 12:01

feat: use prompt-template instead of hardcoding prompt

6f5926e

Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-115…

1d1176a

…-set-model-specific-prompt-templates-in-the-config-file

docs: add deployment docs to multimodal example

8bcab41

Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-115…

356ab7f

…-set-model-specific-prompt-templates-in-the-config-file

feat: add prompt template to engine args, fix gpu

93f2f2c

feat: load decode worker vision model just like encode worker's

accf06a

pull-request-size bot added the size/L label Jun 3, 2025

copy-pr-bot bot temporarily deployed to GITLAB June 3, 2025 23:33 Inactive

github-actions bot added the feat label Jun 3, 2025

copy-pr-bot bot temporarily deployed to GITLAB June 3, 2025 23:34 Inactive

hhzhang16 added 3 commits June 3, 2025 17:01

docs: have agg be the default in multimodal example

69831ad

Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-115…

cd02158

…-set-model-specific-prompt-templates-in-the-config-file

docs: minor doc change

14135c3

copy-pr-bot bot temporarily deployed to GITLAB June 4, 2025 13:48 Inactive

copy-pr-bot bot temporarily deployed to GITLAB June 4, 2025 13:49 Inactive

docs: minor doc change

3f82cd2

copy-pr-bot bot temporarily deployed to GITLAB June 4, 2025 13:52 Inactive

hhzhang16 marked this pull request as ready for review June 4, 2025 13:53

hhzhang16 requested review from indrajit96, whoisj, krishung5, hutm, biswapanda, ishandhanani, julienmancuso, nnshah1, tanmayv25, piotrm-nvidia, ptarasiewiczNV and ryanolson as code owners June 4, 2025 13:53

hhzhang16 requested review from grahamking, paulhendricks, tmonty12, GuanLuo, rmccorm4, tedzhouhk, alec-flowers, kkranen, sshchoi, PeaBrane, mohammedabdulwahhab and a team as code owners June 4, 2025 13:53

copy-pr-bot bot temporarily deployed to GITLAB June 4, 2025 13:54 Inactive

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

examples/llm/configs/disagg.yaml Show resolved Hide resolved

examples/multimodal/components/processor.py Show resolved Hide resolved

docs: minor doc change

555d447

copy-pr-bot bot temporarily deployed to GITLAB June 4, 2025 15:25 Inactive

copy-pr-bot bot temporarily deployed to GITLAB June 4, 2025 15:29 Inactive

whoisj approved these changes Jun 4, 2025

View reviewed changes

whoisj reviewed Jun 4, 2025

View reviewed changes

examples/multimodal/components/decode_worker.py Show resolved Hide resolved

krishung5 approved these changes Jun 4, 2025

View reviewed changes

mohammedabdulwahhab approved these changes Jun 4, 2025

View reviewed changes

julienmancuso approved these changes Jun 4, 2025

View reviewed changes

hhzhang16 merged commit b80482a into main Jun 4, 2025
11 checks passed

hhzhang16 deleted the hannahz/dep-115-set-model-specific-prompt-templates-in-the-config-file branch June 4, 2025 18:47

coderabbitai bot mentioned this pull request Jun 9, 2025

feat: generalize VLM embedding extraction #1388

Merged

This was referenced Jul 14, 2025

docs: Reorganize Repository: migrate Multimodal #1913

Closed

feat: add vLLM v1 multi-modal example. Add llama4 Maverick example #1990

Merged

coderabbitai bot mentioned this pull request Sep 5, 2025

fix: use correct prompt template in agg_qwen.yaml #2909

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment #1366

feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment #1366

Uh oh!

hhzhang16 commented Jun 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jun 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Uh oh!

Uh oh!

krishung5 left a comment

Uh oh!

Uh oh!

Uh oh!

feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment #1366

feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment #1366

Uh oh!

Conversation

hhzhang16 commented Jun 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krishung5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hhzhang16 commented Jun 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 4, 2025 •

edited

Loading