Skip to content

Conversation

hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Jun 3, 2025

Overview:

  • Prompt templates are set in the config YAML files
  • Added instructions for deploying the multimodal example to Kubernetes.

Details:

  • Tested deployment for both agg and disagg in Kubernetes, verified they worked

Curl request output for agg:

There is a white, black, and yellow city bus in the image, parked on the side of a wet rural road. It is located near a street light, and it is not currently in service. The city bus appears to be on the outskirts of the city and is likely waiting to return to service.

Curl request output for disagg:

In the image, there is a bus driving down the street, where a house can be seen nearby, and there is an outdoor stop sign by the street. Additionally, there appears to be a building with a traffic light, an electric pole, and a street sign.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added support for customizable prompt templates in multimodal and LLM examples via configuration and command-line options.
  • Documentation

    • Updated deployment and testing instructions for LLM and multimodal examples, including guidance for Kubernetes deployments and ingress usage.
    • Clarified and expanded sections in example READMEs for improved usability.
  • Style

    • Improved comments in multimodal components for clarity and future flexibility.
  • Refactor

    • Enhanced model loading in multimodal decode worker for better device and precision handling.
  • Chores

    • Updated configuration files to support new prompt template options and standardized GPU resource specifications.

Copy link
Contributor

coderabbitai bot commented Jun 4, 2025

## Walkthrough

This update introduces prompt template configurability to multimodal and LLM examples, allowing prompt formatting via YAML configuration or CLI. Documentation is expanded for Kubernetes deployment with Dynamo Operator, including ingress testing. Minor code changes ensure prompt templates are applied during request processing, and GPU resource specifications are standardized in configuration files.

## Changes

| File(s)                                                                                  | Change Summary                                                                                           |
|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| examples/llm/README.md                                                                   | Clarified deployment testing steps, adding ingress-based direct API call instructions.                   |
| examples/llm/configs/disagg.yaml                                                         | Added `prompt-template` entry to `Processor` config.                                                     |
| examples/multimodal/README.md                                                            | Renamed sections, added detailed Kubernetes deployment instructions, and expanded testing guidance.       |
| examples/multimodal/components/decode_worker.py                                          | Set model loading to use `device_map="auto"`, `torch_dtype=bfloat16`, and `.eval()` for remote prefill.  |
| examples/multimodal/components/prefill_worker.py                                         | Simplified and clarified comments regarding dummy token insertion; added a TODO for flexibility.          |
| examples/multimodal/components/processor.py                                              | Modified prompt construction to use a configurable template from `engine_args.prompt_template`.           |
| examples/multimodal/configs/agg.yaml                                                     | Added `prompt-template` to `Processor`; changed GPU resource values from int to string.                   |
| examples/multimodal/configs/disagg.yaml                                                  | Added `prompt-template` to `Processor`; changed GPU resource values from int to string.                   |
| examples/multimodal/utils/vllm.py                                                        | Added `--prompt-template` CLI argument and corresponding attribute in engine args.                        |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant User
    participant Processor
    participant Model

    User->>Processor: Send request with messages and image
    Processor->>Processor: Apply prompt template (replace <prompt>)
    Processor->>Model: Forward formatted prompt and image
    Model-->>Processor: Generate response
    Processor-->>User: Return response

Possibly related PRs

Poem

A prompt template, shiny and new,
Guides the models in what to do.
Docs now show the Kubernetes way,
With ingress paths to save your day.
GPUs as strings, not just a number—
This rabbit codes with cheerful thunder!
🐇✨


<!-- walkthrough_end -->


---

<details>
<summary>📜 Recent review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 3f82cd2cb6d38f6b629e53c536d04747db6e9704 and 555d447ca4c41b1c5c81ec5d00c0dfce500284db.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `examples/multimodal/README.md` (3 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>examples/multimodal/README.md</summary>

[grammar] ~37-~37: The verb ‘encode’ does not usually follow articles like ‘the’. Check that ‘encode’ is spelled correctly; using ‘encode’ as a noun may be non-standard.
Context: ..../llm/README.md) example. By separating the encode from the prefill and decode stages, we ...

(A_INFINITIVE)

---

[grammar] ~38-~38: The usual collocation for “independently” is “of”, not “from”. Did you mean “independently of”?
Context: ... deployment and scale the encode worker independently from the prefill and decode workers if neede...

(INDEPENDENTLY_FROM_OF)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (2)</summary>

* GitHub Check: Mirror Repository to GitLab
* GitHub Check: Build and Test - vllm

</details>

<details>
<summary>🔇 Additional comments (5)</summary><blockquote>

<details>
<summary>examples/multimodal/README.md (5)</summary>

`31-40`: **Rename "Deployment" sections to "Graph" in aggregated example**

The heading and explanatory text now consistently refer to "Graph" instead of "Deployment," which better reflects the conceptual flow of worker interactions.

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[grammar] ~37-~37: The verb ‘encode’ does not usually follow articles like ‘the’. Check that ‘encode’ is spelled correctly; using ‘encode’ as a noun may be non-standard.
Context: ..../llm/README.md) example. By separating the encode from the prefill and decode stages, we ...

(A_INFINITIVE)

---

[grammar] ~38-~38: The usual collocation for “independently” is “of”, not “from”. Did you mean “independently of”?
Context: ... deployment and scale the encode worker independently from the prefill and decode workers if neede...

(INDEPENDENTLY_FROM_OF)

</details>

</details>

---

`52-52`: **Specify code block language for Bash**

Adding the `bash` annotation to the code fence ensures shell syntax highlighting and improves readability.

---

`92-100`: **Rename "Deployment" sections to "Graph" in disaggregated example**

The disaggregated serving section now uses "Graph" consistently, matching the terminology in the aggregated example and unifying the documentation.

---

`162-165`: **Add "Deployment with Dynamo Operator" section**

The new section clearly outlines how to deploy the multimodal examples on Kubernetes using Dynamo Cloud and the CLI, with correct links to prerequisites and operator docs.

---

`174-202`: **Provide detailed deployment steps**

The bash commands for setting environment variables, building images, and deploying to Kubernetes are comprehensive. Paths to configs and project root are accurate, and the placeholders for image tags and namespaces are clearly documented.

</details>

</blockquote></details>

</details>
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNwSPbABsvkCiQBHbGlcSHFcLzpIACIAMxJqLkQSUOZFEl9EblF4WPgGHgoWblCaZm4vamkULFxYD2ZvcTTaNF8mDDyiSDzIxAAadFp6WnwGR3ZqeHwsWPw+Rq9mxTbIEgAPNHLIyCUK/Fk2DFxoyDlIbGS+WFgAL1hMIgBGADYa3CLabAZq3AB3fCQZjaLBkB4YH5HXCIDSQABi8AoiFwg3UawwKj69hS4gw3V4xVKJG2VWQtERogi8j+6gQtXqCk68CI2Co4hmkAAmgBBACyABkevAsXMFk14C1Vi0MjDIABlUQzWio0JoYZklLaSL0eAYZEUb7svU9ea7Ej7WS67p1BriyW+DZbCoeDkAaWwAko5BoMLcDIY4NIyAeUjOJDIYRCUVFZ3wdXQRCI/iIVXoAAo0ImAJToDAjJCZpMkFM0dPkxCFnN7LwHKFA9LIN0er0paSDGnxhwMH6IRCxbyQKQUXL5KYc/CxML1RE9bAQo1tdSyWEAUU22xdeG4eGQsSKzEYrN8/iCIQ1aT1HyqU4a6S8AHJkCphUuwoClIgGMPPShgUGarG8bVrW7D1h+gwIEQsBeMysC4taDIfl+8AlJIHgMGg3DPjB4jVBON5AnaKyZLIyLEhoRgAHKAlkOR5AUvTVDG/gSPAJB/JAfyUOhbTavweCDJg9AYIC/iVKWkAAOLqAAEh6KC9sEyBcf4kAwRgADWUS6gRSHDqhMywu41h2HM4zJI2WASgSrF4oyXSsmOsyROs8BqDh8hCWaFp1qM4xQk5Jp8DahFLBKxFrOuzqcbSOkhQSIhiBRBj6MY4BQGQ9D4WgeCEKQ5BslETCsOwXC8PwwiUmhMjyEwShUKo6haDoqUmFAcCoKgmA4AQxBkMoEnFVCXBUBxDhOC4Zy1ekDVqJo2i6GAhhpaYBiOhuiAAPQ+Mwm02Cu3IACK8iuGjMLQHAGNE10GBYkDcgAkn1BWpvYjjApN+EBo80h+h4fkTMcgVxQy0TQCEVoEYd5o1oc7CnMkYjTFgfxoMg2DcK0EkEIwlTDrE8h1NQKCTla/i9gp6ASFqmIeDGIXAXDxyDJclDIBhWDkv4YheLVvEEQAqjYgr+LgrLkPQ5whUNXkAAa0LIGJpN5sN1qQoQACQAN6HSuVj8gA8pyJ2UdAAD6lF8iuAC+stGQgT4+PgfzIDaB444gGkoQRvAkKx+CXLzfiBNgFL0GR3CNpOeR5pDIV7jMNB5jwii5vQoqoxQ5J2eoyDcPMpSAjWGFeLA+DImGooeD68EEdyVgPeitD57qmgwP6gYeGqtDICy8CtBCLpYGg9aqVzlKRsikNsHUqd/PUVl4uTyCoB0Dn+LQyXmJY3JLANyOu4CDOiHjTlR5F+cUBJprbgIMEFOw6hsYgVEzCQV03Slq3rc6W07Zta9mRbXLIWDQsgtheEutdaIt0d5PXygNKI40PryC+p3F+BhuSQHIBxQBLI2TI3RB8eQssCTlFwGAMoFQqiyxivGEK1MvDBEgLLaIAs5QrhsFwYAEpMwkD0AAHQwMAMhJQ9D3TlHKB6cpoDclNhwaItDUZPmGFEOc9UCKyysEUHsiB5i0MRkae2XVhhPw5K3T43xqgj1EUSEkNB7AfDjg8UIMEtJByUDHX4DJRTAlCDGVmfAvJo0QEgXAmBQit2UEjcx9IPAEl0ZDbgKEMi6hILCai/AbR8G+niX4Rd8BEHyIMSgRRrhCXUkQQYpoOgfHwL4WINYOIqQaGqdJRht73T3gQmYh9dIn2cGffgk4NiX2vnwW+98iFP1+ilSAmTcn/hxqMguUQb4eimY/XA8glAMFPkaZS3EgRtIotAowEAwBGB/n0TaixlitC8HtA6x1TrnSgTdO6j1nqIPDu9ZwqDJyLNmcZAGAUjScTRhcTGr0cb+CVtXUG0MfLw2xDE406jKCAVgARO54UHkJiLCWKIXkQGJmTK9K4tlujXPyTESSVBuCwH6KcPZzhciWjssfT8+lwX4RCgCCgWk+BRKoGi3Y8BMxUGYLKYyNAKAHmiIzKEpxlHB2oT8egHZsXRCTJhWApwcaelwHK4OjSJ7SxmD8Eo2BViNOdn4ag9RgrghViBY49hsgMBHAwWU3JTELh8LIQS2D2IKHKP4BeoTQyGMIeEPi0QkWq1AlqyAh1FZbEBAbbIbJ5gqshd3OgxjkAxo5DZfu1QlDhOFNpS8Bo0W7lNIzZxtowr2kik6LEOMR7uk9BQb01Q9mXBNZcSGaalaAgAMI1mwPQLyIUx0ZsgBO/kD1YQPUiRCJhH5CjcSCEgdQ1QuzYshbqZEvFR3puVlOgOs7k4rNCZyhk8tL2EGRPMEgBjKCsR+CUjArEigYDrNTYctNsRi24IMAQocvC3pGDDN1oQZZ5gbXwAQcZj1kuLK9ElBZMNEvDl+6eDZBLJxEj6fgWBwS0EqZAOSiYrRwjQD8N8WksCI1Fk+O9GI77Np4JUDABUw353IMce2HgS3Dy8Hos03K3LVDLhxHGNd+nIvdecMZYAM7OGzghOmAGk70HzllPgI6c5L2kAMOh2KaWHgoO0FgwJkNvkHJQdldcG5NxbqJyANgSDxDhT2ZzGw5UYl8PgbN1BG3waZqEPu27nAeF1IOpQ6dTT9ivk6s0VapMUddTFwY7GUjHIxKQKEJH6B1I2dIMucYrRb1gV0uVQylOIQGT040+EVlXzWRMjZ+Rpm4QwVABZ6CgvrDGT1zwPGH7HBmYgAA3BRjxYxAbhPBRjLG1QWnHKUKcz+FyrlRRubi+0ADihv2OMAxUSgzYCqFRobgsh3kwM+fA/qhVfkTQBYwdBf0fs/TAiObxHgaxqkhnyhk/JKjUzhPMCdSozGhcku9wK0pfAgy7ogRWDAza6jRDPMumqF7BzSA432vR0fIDILTTe1gWLTEDkGnFd4IXIFByl+w1kg7nBLjR2WCdmBm19leNJtAlG0mZ0oXwWE6gZOdoMEKaO1L4DB2Z5L1j/XI1WFhKVrYkRcHln7fIJAzbAm4AAXmiDlAgijnPW5YFMXnnlezMiwErnX0JnNKG/W2NOLCCAUADGbWg2zsjm4DwGDQAg7XUFeLQnGrMzgx9wK8HdXrQmGXbiDlXHOleoBtFZEq5IqhB2SIXNYjCbXgulFNRg567J2z9m0NMWY7bzMBOhzFNYikFBqYnIoDSmmUzaHKwtHSGu7yawc5zx9WXtfPl18ZU3NmzcG+c9v/28l9Nn/sg+wyL6rOM8v/rWzn57ZgV/S5a0jvSFuURB5Z3ygXehJtcnwovC3fmPdx7z2J9vZehJMgv8vvkChgsZENOwBqLJmoI+olnqJQOCl5P4GkIwvvuqiQGXDBpil8KwITPgKxsgGJDCkfAyMSJ6KYnZKErcHAT7PuCUIyPqIaIQttvTskLTsZPMMyLqKsBAZdhfJUKLlOMTDgcwHgQQZxEcqeggVEAIGjGsnEmsMwOQTppztQX7pWlqFEPFHQaEIwWIKyCQKiJujOrxuQFoaJMSPgKGCIWIWQE+LECaiFLwqQCxmQGJlCptvQLwZ7qEtsG5nUKgKMvxk5AVmtuDhgEHETKEOgZgRojYa4caAljUFcBJLIWwbliFGQXQCoVQV3MnN3OgDAAbIdAbNgnGNXICMCFpFOKgJ6CGNMAsO+j0C5G5DsF5GjmAHsJlOwHLpvv+NtsCEoDPgyExmLKsN3v1n3scAPs0fahDuhOkPVp8t0s1iQf9G1kMp1uNofvwL1tNgNs/K/OQOfuct/DfltCdsRI/sJpAa/johZvMA9k9h/C9nAt8h9m9F9iAb9gYMZLLAgoVLQplF5qkCkITqzoDnkOYQwR8IaARIEkCBZnwoyEnKEOcJhBUByt0DYjoZGPYnpiwCwskF4LEBoGQEUuQGbM4EQDCLYmbFQuJB+u4TEfUhorLCIjoXoLQiDPnsSNQg4qgP4OgZqhLvHIiBXDQOsIhonKBALgREliwJDCeEpLgI+Iib2HwrKNokbgHIgLzArv6DKe6qqiPA8FnHVEgk4nZB0BhEnFMA3mwhwlwpADwn+PwkIgAIoqnIwKLi70IMjKkhCRhSl+6sL3R6gqS+ly5/AGkLHHCgSoBl7ircxUjDHxL3G9hRC2KOLDh4juHkyhDzGImzzKg1DJaQxfgJC8qTghSywTouLw4bhGi+anjIi0JeTBairVnylumQBCz8iDDIEgjowQidybzj7LFT574tbrFz6bEjLbHdZH6TIn6r6HFzIjY/Tb4MihJEAYhizky7FmhznT5bETbLl9YzbhDPwSGqSDHpJZ6QmjjgqdTxl9qrAzksIAk0LFnglc7EiTC8oRHLgvGnFX40oXH35tBnZMg0mbSgLgLMCQKgWvbvEUp/KfSAo/HgEzDryBSMQQkbbEFDCzohocRaQkK2KUJ8mMm0IYrOoeBaIZl6IUCfpiopoMJtDMKsLsKcLcLOHunCK2LiLciSLSKyLyKKKwh+rkgBr6kESSRWACzBx6KsjMaV7VD0UsIABqO00MFpAA6l/pQBoAqBQD7tyBQDSRoOTAHIHtIBoEQNuO2cnLLLpV4MwCuBCOkEZYKiZWZRZVZTCLZWpQ5U5dgEokckCunPuO8MWJirLE8PHjRFad0LLPeE8PeG3pkp3jkqNjjBMb3jkv3vUrMc0kcveUsTvCsdPp+bsrvr0vvovpNiuVeXNuvpuVvmNueUea1QNjshsdPgMW0gBAEezLhcyI5OCoxCcZfodh2rfpcQ/ngsArhkQGAhAr/qhd+YARhd9qAX9nglNYQgRaqkRdjICAUSPDgpABRSwlRQyT+RxQyBpSwo6XxS6QJYIkJZyRIlIjInItAL6RcHmJirWdomMA8axaikYvdJrjMLxEziFIpcpSFfZR6nRM+XvlpbLAFcbpZTSbQjGG5XpddiQL5UKrLIMCTR5dqRThTZQFTaGe5Z5d5UoAzdDdttFT0LFd1FEqQHwIlclYUfqJDOlZldlR3tkn0bSoVUebUjMXauVXeSchOdVVOY1XVYNXvmeTsesvsafrMtRMcaBXNdfgtZBa2lcXgMKFtBIDtE8VtW8TtUgntd8VuUYFgjdUhrQGAOpF3FZatiwhAA9TRT+XyrINkDmZDCmiPJ4jlEsIOFxYxdEByYSHoLbsEjJshKhKGPUF4NwOqRWC4SaaohVmsegIHXWDrlcKNU+jXSbvbR5VSUFUTXOGirCPDhQOTMJjpvJfFM4Oka9RDl1FXQmU+C7nudCddaGtQE4lBg4qQjofSWHTQLQhyCFKLOLFELLNyFjhCF5RSSQATYgNyZeJgD8L0blcrj3vLSVYPvaqqi0EDuOR0pOfvJrRXfVYMqeQuT1frSvtebMlADJcSmRUuiupXSyHWLLCHTodRfibQmmKHoxfqEzfHU0G9WneQhnbLDmBjrOPOIQqQoPY3TtC3YTQBLLBBXflbQ/jbVJptE3cwE8XbHMqA6RTdXPd+HgJjruYBqBNQ3iGkhQ7ScvY9WvXXR4FvX2knUwi6JOCQ0iGQ83dSafVQzQ0tdBQw3bQ7Y9mw2cl/MtG1E3PvvbmhYNA5qVA6mNG7TzjNCoHNM1ItIYCY0NOoLjj3GbPTuxHQGbGelfC4ytFAAAMyxAAAcAATAwLQNEwIC8LQCE+E7EC8PE5EwAJwkAACsITDAOTCTAADAACwADspTtA8TJA6TJTxTQTJjWTDTtARTpTGERTDARTTwAgTweTDA4TTwogWTtABTBTDABTtAsQPwWTwzkT4TRTFTQTBgbjDmHj/ciA3jRuvjtAZsmULjQAA= -->

<!-- internal state end -->
<!-- tips_start -->

---



<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=ai-dynamo/dynamo&utm_content=1366):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
examples/multimodal/components/prefill_worker.py (1)

249-250: Address the line length and consider implementing the TODO.

The comment simplification is good, but there are a couple of issues to address:

  1. Line 249 exceeds the 100-character limit (102 characters)
  2. The TODO suggests making dummy token handling more flexible and model-dependent

Apply this diff to fix the line length:

-            # some placeholder dummy tokens are inserted based on the embedding size in the worker.py.
+            # some placeholder dummy tokens are inserted based on the embedding size in worker.py.

Would you like me to help implement a more flexible, model-dependent approach for the dummy token handling?

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 249-249: Line too long (102/100)

(C0301)


[warning] 250-250: TODO: make this more flexible/model-dependent

(W0511)

examples/multimodal/README.md (2)

100-102: Inconsistent section heading terminology
The disaggregated example now uses “### Local Serving” while the aggregated example retains “### Deployment”. This mismatch could confuse readers. Consider renaming for parity—e.g.:

-### Local Serving
+### Local Deployment

or

-### Local Serving
+### Local Serving (Disaggregated)

162-239: New Operator deployment section looks solid; minor path correction
The “Deployment with Dynamo Operator” segment is comprehensive and aligns well with the PR objectives. One small nitpick: the commented disaggregated deploy command omits ./ before configs/disagg.yaml, unlike the aggregated example. For consistency, update as follows:

-# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f configs/disagg.yaml
+# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/disagg.yaml
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25c711f and 3f82cd2.

📒 Files selected for processing (9)
  • examples/llm/README.md (1 hunks)
  • examples/llm/configs/disagg.yaml (1 hunks)
  • examples/multimodal/README.md (2 hunks)
  • examples/multimodal/components/decode_worker.py (1 hunks)
  • examples/multimodal/components/prefill_worker.py (1 hunks)
  • examples/multimodal/components/processor.py (1 hunks)
  • examples/multimodal/configs/agg.yaml (3 hunks)
  • examples/multimodal/configs/disagg.yaml (3 hunks)
  • examples/multimodal/utils/vllm.py (2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
examples/multimodal/components/prefill_worker.py

[convention] 249-249: Line too long (102/100)

(C0301)


[warning] 250-250: TODO: make this more flexible/model-dependent

(W0511)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (8)
examples/multimodal/utils/vllm.py (2)

54-59: LGTM! Clean implementation of prompt template CLI argument.

The argument follows the established pattern and provides a sensible default value.


68-68: LGTM! Proper assignment of parsed argument.

The prompt template is correctly assigned to the engine args object, maintaining consistency with other custom arguments.

examples/multimodal/configs/disagg.yaml (2)

23-23: LGTM! Appropriate prompt template for multimodal applications.

The template format correctly includes both image and text placeholders, suitable for multimodal processing.


34-34:

❓ Verification inconclusive

Verify GPU resource specification type change.

The GPU resource specifications were changed from integer 1 to string '1'. Please ensure this change is intentional and that the configuration parser correctly handles string values for GPU resources.

Also applies to: 42-42, 51-51


🏁 Script executed:

#!/bin/bash
# Check if other config files use string or integer values for GPU resources
echo "Checking GPU resource specifications across config files:"
rg -A 2 -B 2 "gpu:" examples/

Length of output: 13853


Verify parser handles mixed GPU spec types and standardize configs
A search across the examples/ directory shows that GPU counts are inconsistently quoted: many newer configs use string values (gpu: '1', '8', '16'), while some still use unquoted integers (gpu: 1, gpu: 8, etc.). Please ensure your YAML‐loading logic correctly casts string values to integers for GPU resources, and consider normalizing all example configs to a single format.

Files still using integer GPU specs:

  • examples/sglang/configs/disagg.yaml (gpu: 1)
  • examples/tensorrt_llm/configs/agg.yaml (gpu: 1)
  • examples/vllm_v0/configs/disagg.yaml (gpu: 1)

Either update these to quoted strings or confirm the parser accepts both types without issue.

examples/llm/README.md (1)

228-229: LGTM! Helpful documentation improvement.

The clarification about using ingress URLs directly improves the user experience by providing a more streamlined testing approach when ingress is available.

examples/multimodal/configs/agg.yaml (2)

22-22: LGTM! Configurable prompt template implementation.

The addition of the prompt-template configuration enables flexible prompt formatting for multimodal interactions. The template format properly includes placeholders for image and prompt content.


34-34: GPU resource format standardization.

The change from integer 1 to string '1' for GPU resources is likely required for proper YAML parsing or Kubernetes resource specifications. This standardization improves consistency across configuration files.

Also applies to: 43-43

examples/multimodal/components/decode_worker.py (1)

137-141: LGTM! Improved model loading configuration.

The updates enhance model loading with:

  • device_map="auto" for optimal device allocation
  • torch_dtype=torch.bfloat16 for efficient precision
  • .eval() for proper inference mode

These changes improve performance and resource utilization in remote prefill mode.

Copy link
Collaborator

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the changes!

@hhzhang16 hhzhang16 merged commit b80482a into main Jun 4, 2025
11 checks passed
@hhzhang16 hhzhang16 deleted the hannahz/dep-115-set-model-specific-prompt-templates-in-the-config-file branch June 4, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants