Skip to content

Conversation

ishandhanani
Copy link
Contributor

@ishandhanani ishandhanani commented Jul 1, 2025

Description

In order to improve UX and benchmarking efforts, these introduce a set of clean components that leverage the dynamo rust bindings. There is no more reliance on circusd, sdkor any local orchestration.

TODO

  • Why won't export PYTHONPATH=$PYTHONPATH:/workspace/examples/sglang/utils run in container even though I added?

Changes

  1. dynamo-serve used to simply spin each worker in a process. This is now done in a dead simple bash script for all examples. No UX loss (in fact its much much much simpler).
  2. Components do not require any depends or @service or @endpoint
  3. Our shell scripts (which can and should be used in CI) - now include a cleanup trap that I grabbed from our CI. This should solve any phantom process problem

Examples

Single node examples can be run using either

dynamo-run in=http out=dyn 
python3 components/worker_inc.py <flags>

or

./launch_agg.sh

NOTE: the ./launch_agg.sh just runs the lines above 😁

Multinode is where this PR really shines. You can now just export NATS/ETCD and then run python3 on each node

Tests

  • agg
  • disagg
  • disagg + moe
  • slurm cluster dsr1 wide ep

Copy link

copy-pr-bot bot commented Jul 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ishandhanani ishandhanani marked this pull request as ready for review July 1, 2025 21:34
@ai-dynamo ai-dynamo deleted a comment from coderabbitai bot Jul 1, 2025
Copy link
Contributor

coderabbitai bot commented Jul 1, 2025

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{"name":"HttpError","status":404,"request":{"method":"PATCH","url":"https://api.github.com/repos/ai-dynamo/dynamo/issues/comments/3025614230","headers":{"accept":"application/vnd.github.v3+json","user-agent":"octokit.js/0.0.0-development octokit-core.js/7.0.2 Node.js/24","authorization":"token [REDACTED]","content-type":"application/json; charset=utf-8"},"body":{"body":"<!-- This is an auto-generated comment: summarize by coderabbit.ai -->\n<!-- walkthrough_start -->\n\n## Walkthrough\n\nNew worker components for the sglang framework have been introduced, providing distributed language model serving capabilities. The changes add asynchronous request handlers, worker initialization routines, and argument parsing utilities. The new modules support both standard and decode-specific workers, metrics publishing, and integration with distributed runtime and event loop optimizations.\n\n## Changes\n\n| File(s)                                                     | Change Summary                                                                                  |\n|-------------------------------------------------------------|------------------------------------------------------------------------------------------------|\n| examples/sglang/components/decode_worker_inc.py             | Added async decode worker component with request handler, worker/init functions, and service endpoint setup. |\n| examples/sglang/components/worker_inc.py                    | Added distributed worker component with request handler, metrics, disaggregation support, and async worker/init routines. |\n| examples/sglang/utils/sgl_utils.py                          | Added function to parse server arguments from a provided list, ensuring unique bootstrap port assignment. |\n\n## Poem\n\n> In burrows deep, new workers wake,  \n> To parse and serve, for language’s sake.  \n> Async they hop, with metrics in tow,  \n> Through loops of uv, their tokens flow.  \n> Arguments parsed, requests they greet—  \n> Distributed dreams, in code complete!  \n> 🐇✨\n\n<!-- walkthrough_end -->\n<!-- internal state start -->\n\n\n<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNwSPbABsvkCiQBHbGlcSHFcLzpIACIAMxJqLjRaelpZDDRmfAVmbnxyDFxkWPw+RCIvTCJoyAB3NGQHAWZ1Gno5MNgPbERKFERYTFpBjIwdBus7DEcBPoBGAHYAJjmNGC7IJUQGCnhucXyUDAYvbC30SAFdkliYAHkAETvILyRQ2vVYC8R4DAqPJi5SJtFA0ZhcaLMcqIbiiDQ1dyTdDwKFhbLJegPdKZbKAvIFIp+EiVEEEGLlSp/eGQABy6JS6ng+TQviUuG0XmQpUgJAAHtxKdQmRhkM4PLx8BJ4Ep6L9Oh4tjs9gcMGtEdxvL5/EEQoxMJcPFKSLUotRILBcLhuIgOAB6W1ET7YAQaQG27RgNIZLK2r0420any2xYrDRGfTGcBQMj0fC3NB4QikchUEGAtiFLi8fjCUTiKQyeRMJRUVTqLQ6CMmKBwVCofUJgjEMjKNMsDOcPxoWr2RzMZzyDrF5RlzTaXRgQyR0wGPmZAXSW0Uqq2vH5diIX2iRQkAD6tVKAGtKLvfgwNNxZBwDNFbwYLJAAIIASWbKeoUQcThc/FuDBGpCIEYj6QOQPZKMOdRHn0a4Ev0Ry4BQijYAwUQlHwuAbMufyQLEVBsAeFCHgANCgQIkB2pqilgDTpP+SEYPgPT2JQUqoZ0ZojLQkTIBBO5EjqiBFGqGxMP4jCVIgyAAAYPNuSg2IEwRCQAEkMkQUNJ8G/IyLLwAAXlEHyYegWDSRSGgAKJ/L8JBab8QmYOxQw8vy+C9NR6CIHRkDJq23JsJhiiQNJfmpnZdQIP+mzydIkAAFIAMp3DSAnKUUpF8ohaBiMgBDHlgz4PNR9CIPOrw4dwziZCQNAUIgpEub8Ej4MeeUbGQjrkAA5KK3nHLADFMcgYVCocgWwMFxlfEJ/iZL8RA8hkAiRLQIkeKNKpEg4XiEmKkCyPAxIyl5dGDfkw1ePIZKYQCLIaWGBiIoRx58LE2DHCqpF8eF9DTSFAACfpZPu0EUAAFI54gMAAvAAYiyvQAJTSaROniHphntQqby7AIeBRBQH3iGwS1SgxHaXPIVX1QtLEUFIfDOEQjgbqZ9AMPddMNv1DAhejWnvZ9wrrfzYy4ILH1iMKjBzTQnnmRUVk2eQ9kiuyxwkGj6uYBj8vymlupcRpjUYKVtXINg3DfKx8DsTGeS/KE3pRNEm0kDU01yrdMR8UoNSwewv4G9EwP4DULswjlJAiagDv4E79MFgbxuUL1IXu1pE3BbRA1DT0V2PYi2y7PsCgYLE8As/4yBePgRBdUQ2uOT42MhdgEh1/g3BaSQUiFC8+Dd7h3KwhQ6H9prpv0ETIoGy9MGlEx4jkJFZCuaIBP0BMPv9nKWS0N4MfrAC/E9HF0lQkQMKiIL3JJSlMXFtzZsISQRDhcg/0+1i3rZLQuN4D4xBLPEmHg8I1RemiHgSEpRKG+JzSoK0FSxSTnbGORhzCWEfLtVswo8rZB9hBSoqZ8HBz5HkCgIJR7OleHzdgjJpDhifCkKIJwGgyTksORSglcBqTNhpLS/0mDqyJmIbk0ldynnFlIyGxJYiZRViQLgFlrJdRICjdmp1ji+RbKmAKtVJr0FCnoj8civAKIEioxCmi5TSTnORTc2EiCrhYPiDcW5hygyIieM8F5ZDSQMFAR8rDt681wlLLa0kF4Q1ASiZRkAHiAOAXQGwxN4maJ+h+P6nxAZh28a9SG7Jobw0RhotWIUHELicRUFcgdCibj9nuGJ0jzyXkCcE0J2i+ZC2locaS6NwZxLYFwJJs0gFbzSYUeJpFegMxPMzG0kBEqsUoI+Cg19bFmSqTxJctS/iuNyOuBpnidwFN8ccfxgTbzRHDDOHZi5nGHPcSclpfjLzXhufebBr53alT7AOYO/4qhMIMCBMC1hZBBSwAfI+IU3mXPadpQoSFD6oVoGjciHZuabGSVvF4VRsBoFIJAA+xIoI+L4PU0IPQ6Y+0VpSIggt8LGiPFo6SYctLDOPoiMSAJJIyR4elfh3FKBZ0wMSuK2p0owPwKhKSC1SKBV2AwZAGoVpIAQH8ae/B9jCnuvIBw3BKF7VxWVBu/giBjRhfxaV8Bdg4TQE/fiJwjqFDWM+UI6N4CYziv9fUiUADiAAZJa6jZmrKZhs1mDSdVOupUSeI/gp4sUtHTK2FxlV2zVbQzVfQXLdxVAaokiEjpShwgIIeRRsrW0mkJAMpQvUV2yOhM1xKP7v2tUXDYqc+D43gF4WgjRyp0xpjVOqxQkLMCOICOm0qQgNTNSvFmeaZDUGinhFg9gFqRENkJRdLl3ZcjwBqPavNzqMQLrID1WAAHmo7VaraZK0ahF4CQGmcVK34GrVQa2Y7AqUBKixM2ooYHvqQvK3oM8lK6jJE6ppEk3XvAQLukRDBsAUGTREeQm06WtTIJOrdPtOq2VFsvU99hEIJGnagCUkGohkkOsdNEBUZ3+A7FyN6OkBhEgaPkRdxs6aVpMgIddXwXI/D+Lu+dQlSU7kQGsAAsrVFVopxLqteAMIyuSBTR0moOvoEgWTKR4DGdN1oqOZF7Maxt3aPCwt3UoSu5BPK5z5tEsGkthaHCySCCY8HsRZApa9UitQopfBph5JO+bo0ce1rpV4WMDYkdXi5alpFLVvEAwbMlvh45OwjYSDNPsZOhF7ZAcG/0EOuqDvAW45A6B0CRqRERldq5xSzaqrRTqABaimACKkBjyyDAEZ04Hg+5Bw03migOq5nJx9u7JatBHbupPvYJUZdz4yQ7l3HuI8+CTYHnt3VJMDLWq0bPNuMTukXsuvIML68+Sbz88gPeqosFPlwaQ/j0CiGiBIdarktwKGNqiDQjV9DpniFBVAAAwgK9AoSuDSSFSEEVgi6i5OzkOkKUj0ZSNRiFXouAra7k64gIn0kdgJBoOTlT2bdzTa0xQXc+XChU93FbWgH56eltVZz0guBdxfp/Wgbg0iSic/7YO3cZUgQLSZ9VKEgvaq7lKyLsTcuDJ2VIpIpAmvcD/nVzBoSqv3x09F7NcX6uh7MCp5nPXTOIPSEQHLyz9undvsrj4Xcm1SgVPsbycqjz9kuOpZuBFbSAlBKfOE3pW0MR0BRzEoZ6SRmJLxW0KZYDMnbl+ljky0kgaBfwOciGUM7alM5OUo4lTg+OL2Yy55xyii2ij1c2Pj54+RJlkn2gKPBnctGVn1J6etbRdZ4srgKz5kUHWZswPDyanN4j+3sGrTO9fKCfchv1Sm8rjwAOlfXPxCcn8Z8u8D4XxvlbJ+AFP44yMAAqC8FxoInebMpFvczjdyLNaUIv5l0phGaOyG1BcJpqEE/szDGqEOMn8LxPanmFdKRN/tjDRoUOiCFLPozAvpTkcI5Cmi5ARhhnFE6h9PADqCFHeu2pataiLlWtbhLiavZH1D8EQOQLKPVt+q5AKHbOoFdLKnAnQB6qEDTh+ArAvrAVYM4HMkTjXKsqBlbjWjwI2o1CkMgHDsGs+OgLFmzJutOtJDgWshspTqgbIXFD7L8BRjAZTJATqiIgAkWj4PIL8CILlAbAofMvQCaghIQhsN/tvHoQ0gaOhKfBgIzMutAkYZGngdJLHMUL3ocKgD8C0CQnXnXI6HzDdB1LyG8HTNJN/nLmHn/qYVpMArqnflyDREIdKKaMEaEJAV2LdBhCMLhLVP+NzA0YRtOk6vNuguXG1hhl2pgt8l9nVEDn9hsMQtVCqMDrwWDrGHwNNlDuEEdEBLHnDEkTRMjtQTcDwBYcUYyqUdfK0uDNPi8G8AANqzQAC6SMugegyysRZRdeQeIeK+h+Z+J+R+5+7SN4d4O+U41Yy2wcjYSYpibYrA7AXAVAPYX4/YP4Q4O4pYagY4lYk4BgIJ6Y6gp4Q6JuRoJotA7uzgoQEYWJUYlwAAnAAMwLAABsCwDAsQSwAAHFSQAKxLBUn0lzApAMAMCskJBMloBLBLACDLA0lKAkACAckcm0AAAsE4wJlJOJwu0obu/ghJdAbOr8+gQAA=== -->\n\n<!-- internal state end -->\n<!-- tips_start -->\n\n---\n\nThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.\n\n<details>\n<summary>❤️ Share</summary>\n\n- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)\n- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)\n- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)\n- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)\n\n</details>\n\n<details>\n<summary>🪧 Tips</summary>\n\n### Chat\n\nThere are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=ai-dynamo/dynamo&utm_content=1721):\n\n- Review comments: Directly reply to a review comment made by CodeRabbit. Example:\n  - `I pushed a fix in commit <commit_id>, please review it.`\n  - `Explain this complex logic.`\n  - `Open a follow-up GitHub issue for this discussion.`\n- Files and specific lines of code (under the \"Files changed\" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:\n  - `@coderabbitai explain this code block.`\n  -\t`@coderabbitai modularize this function.`\n- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:\n  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`\n  - `@coderabbitai read src/utils.ts and explain its main purpose.`\n  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`\n  - `@coderabbitai help me debug CodeRabbit configuration file.`\n\n### Support\n\nNeed help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.\n\nNote: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.\n\n### CodeRabbit Commands (Invoked using PR comments)\n\n- `@coderabbitai pause` to pause the reviews on a PR.\n- `@coderabbitai resume` to resume the paused reviews.\n- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.\n- `@coderabbitai full review` to do a full review from scratch and review all the files again.\n- `@coderabbitai summary` to regenerate the summary of the PR.\n- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.\n- `@coderabbitai resolve` resolve all the CodeRabbit review comments.\n- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.\n- `@coderabbitai help` to get help.\n\n### Other keywords and placeholders\n\n- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.\n- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.\n- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.\n\n### Documentation and Community\n\n- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.\n- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.\n- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.\n\n</details>\n\n<!-- tips_end -->"},"request":{}},"response":{"url":"https://api.github.com/repos/ai-dynamo/dynamo/issues/comments/3025614230","status":404,"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","content-encoding":"gzip","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Tue, 01 Jul 2025 21:40:26 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"github.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","transfer-encoding":"chunked","vary":"Accept-Encoding, Accept, X-Requested-With","x-accepted-github-permissions":"issues=write; pull_requests=write","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-api-version-selected":"2022-11-28","x-github-media-type":"github.v3; format=json","x-github-request-id":"700C:2984D6:A8114:151C4A:686455CA","x-ratelimit-limit":"15000","x-ratelimit-remaining":"14908","x-ratelimit-reset":"1751409250","x-ratelimit-resource":"core","x-ratelimit-used":"92","x-xss-protection":"0"},"data":{"message":"Not Found","documentation_url":"https://docs.github.com/rest/issues/comments#update-an-issue-comment","status":"404"}}}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
examples/sglang/utils/sgl_utils.py (1)

44-54: LGTM! Clean utility function implementation.

The function follows the same pattern as the existing parse_sglang_args function and correctly handles argument parsing with bootstrap port injection. Consider adding a docstring to document the purpose and parameters for consistency with coding standards.

examples/sglang/components/worker_inc.py (3)

131-132: Use cryptographically secure random for bootstrap room generation.

The current implementation uses random.randint() which is not cryptographically secure. For distributed systems communication, consider using a more secure random generator.

Apply this diff to use secure random:

+import secrets
+
 def _generate_bootstrap_room(self):
-    return random.randint(0, 2**63 - 1)
+    return secrets.randbits(63)

77-92: Document the temporary nature of placeholder metrics more clearly.

The placeholder metrics implementation should be more clearly documented as temporary to prevent confusion.

Consider updating the warning message to be more explicit:

 logging.warning(
-    "Publishing placeholder metrics in SGLangWorker; these are NOT real engine metrics yet and will be replaced once upstream support lands."
+    "TEMPORARY: Publishing random placeholder metrics. These will be replaced with real engine metrics once sgl-project/sglang#6721 is merged. Do not rely on these values for monitoring or scaling decisions."
 )

226-226: Fix formatting: remove whitespace before colon.

Apply this diff to fix the formatting issue:

-                new_tokens = data["output_ids"][num_output_tokens_so_far[index] :]
+                new_tokens = data["output_ids"][num_output_tokens_so_far[index]:]
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4676f8 and b93767c.

📒 Files selected for processing (3)
  • examples/sglang/components/decode_worker_inc.py (1 hunks)
  • examples/sglang/components/worker_inc.py (1 hunks)
  • examples/sglang/utils/sgl_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: t-ob
PR: ai-dynamo/dynamo#1290
File: launch/dynamo-run/src/subprocess/sglang_inc.py:80-110
Timestamp: 2025-06-03T10:17:51.711Z
Learning: The sglang `async_encode` method does not support streaming options, so collecting all embeddings before yielding is the correct approach for embedding requests.
examples/sglang/components/decode_worker_inc.py (1)
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.916Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.
examples/sglang/components/worker_inc.py (2)
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.916Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.
🪛 Pylint (3.3.7)
examples/sglang/components/decode_worker_inc.py

[refactor] 34-34: Too few public methods (1/2)

(R0903)


[error] 79-79: No value for argument 'runtime' in function call

(E1120)

examples/sglang/components/worker_inc.py

[error] 298-298: No value for argument 'runtime' in function call

(E1120)

🪛 Flake8 (7.2.0)
examples/sglang/components/worker_inc.py

[error] 226-226: whitespace before ':'

(E203)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (2)
examples/sglang/components/decode_worker_inc.py (1)

57-79: Well-structured worker implementation.

The worker function and initialization logic are correctly implemented. The static analysis hint about missing runtime argument is a false positive - the @dynamo_worker decorator automatically injects the runtime parameter as confirmed by the retrieved learnings.

examples/sglang/components/worker_inc.py (1)

249-298: Well-architected worker implementation with proper component separation.

The worker initialization and main execution logic are well-structured. The static analysis hint about missing runtime argument is a false positive - the @dynamo_worker decorator automatically injects the runtime parameter based on retrieved learnings. The separation between prefill and decode workers provides good modularity for disaggregated inference.

@pull-request-size pull-request-size bot added size/XL and removed size/L labels Jul 1, 2025
@alec-flowers
Copy link
Contributor

alec-flowers commented Jul 2, 2025

It's SO MUCH simpler though. Now people reading the example can actually understand what's going on.

Yeah, I agree it's simpler. But would it be better to let all components to read config from a centralized file and integrate auto GPU selection? Now if users want to change the number of P/D workers or some settings, there are quite some changes need to made in the shell script.

Yea but if you want this then use dynamo deploy. We are separating concerns here. If you want full deployment help, then use the right tool. Here we are showing examples of how to run using dynamo serve, where yes, the user has to necessarily be more involved.

At the dynamo serve level, the contract is the user must manage the processes. At this level we can't have resource management and stuff. What if the user wants to put dynamo serve into a slurm script and now they have to think about srun resource management and the implicit built in resource management. That is too confusing.

Its a purposeful regression. We are taking away features and letting users have more control in the way that they are accustomed too. Any user who has used SSH and GPUs can manage these examples. If they want full service deployment where they fill out a config and one click deploy then they can move to dynamo deploy where under the hood it will basically just do this work for them.

The regression you are seeing is because dynamo deploy isn't built up to have the features yet.

@tedzhouhk
Copy link
Contributor

Yea but if you want this then use dynamo deploy. We are separating concerns here. If you want full deployment help, then use the right tool. Here we are showing examples of how to run using dynamo serve, where yes, the user has to necessarily be more involved.

At the dynamo serve level, the contract is the user must manage the processes. At this level we can't have resource management and stuff. What if the user wants to put dynamo serve into a slurm script and now they have to think about srun resource management and the implicit built in resource management. That is too confusing.

Its a purposeful regression. We are taking away features and letting users have more control in the way that they are accustomed too. Any user who has used SSH and GPUs can manage these examples. If they want full service deployment where they fill out a config and one click deploy then they can move to dynamo deploy where under the hood it will basically just do this work for them.

I see. Does dynamo deploy only works for k8s or it also works for bare metal?

@alec-flowers
Copy link
Contributor

alec-flowers commented Jul 2, 2025

Yea but if you want this then use dynamo deploy. We are separating concerns here. If you want full deployment help, then use the right tool. Here we are showing examples of how to run using dynamo serve, where yes, the user has to necessarily be more involved.
At the dynamo serve level, the contract is the user must manage the processes. At this level we can't have resource management and stuff. What if the user wants to put dynamo serve into a slurm script and now they have to think about srun resource management and the implicit built in resource management. That is too confusing.
Its a purposeful regression. We are taking away features and letting users have more control in the way that they are accustomed too. Any user who has used SSH and GPUs can manage these examples. If they want full service deployment where they fill out a config and one click deploy then they can move to dynamo deploy where under the hood it will basically just do this work for them.

I see. Does dynamo deploy only works for k8s or it also works for bare metal?

At first I was against it working for bare metal. My opinion is we should be focused on K8s and that is where our effort should go. For bare metal we can let people manage it with scripts.

However, I was thinking about this last night and I think I would be ok with doing it for bare metal under these conditions:

  1. It truly is a one run command even for multi-node -> This means setting up something like Ray where you start the server on the head node and then run processes on the other nodes. So you would basically have to "setup" your cluster and then you could run your one command on any node and it would just deploy.
  2. The config file to deploy on K8s / bare metal / slurm should be as close to the same as possible. Preferably with no changes.

At that point why not just set up a minkube cluster on your bare metal though... In the end that prioritization of work will be set by Maksim and team.

@ishandhanani
Copy link
Contributor Author

@tedzhouhk - this now allows us to get CI properly wired up without having to work across dynamo serve/local orchestration barriers. A python script is much easier to add in (and tear down)

@ishandhanani ishandhanani enabled auto-merge (squash) July 2, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants