feat: add dynamo components for sglang #1721

ishandhanani · 2025-07-01T19:06:51Z

Description

In order to improve UX and benchmarking efforts, these introduce a set of clean components that leverage the dynamo rust bindings. There is no more reliance on circusd, sdkor any local orchestration.

TODO

Why won't export PYTHONPATH=$PYTHONPATH:/workspace/examples/sglang/utils run in container even though I added?

Changes

dynamo-serve used to simply spin each worker in a process. This is now done in a dead simple bash script for all examples. No UX loss (in fact its much much much simpler).
Components do not require any depends or @service or @endpoint
Our shell scripts (which can and should be used in CI) - now include a cleanup trap that I grabbed from our CI. This should solve any phantom process problem

Examples

Single node examples can be run using either

dynamo-run in=http out=dyn 
python3 components/worker_inc.py <flags>

or

./launch_agg.sh

NOTE: the ./launch_agg.sh just runs the lines above 😁

Multinode is where this PR really shines. You can now just export NATS/ETCD and then run python3 on each node

Tests

agg
disagg
disagg + moe
slurm cluster dsr1 wide ep

copy-pr-bot · 2025-07-01T19:06:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-07-01T21:40:27Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{"name":"HttpError","status":404,"request":{"method":"PATCH","url":"https://api.github.com/repos/ai-dynamo/dynamo/issues/comments/3025614230","headers":{"accept":"application/vnd.github.v3+json","user-agent":"octokit.js/0.0.0-development octokit-core.js/7.0.2 Node.js/24","authorization":"token [REDACTED]","content-type":"application/json; charset=utf-8"},"body":{"body":"<!-- This is an auto-generated comment: summarize by coderabbit.ai -->\n<!-- walkthrough_start -->\n\n## Walkthrough\n\nNew worker components for the sglang framework have been introduced, providing distributed language model serving capabilities. The changes add asynchronous request handlers, worker initialization routines, and argument parsing utilities. The new modules support both standard and decode-specific workers, metrics publishing, and integration with distributed runtime and event loop optimizations.\n\n## Changes\n\n| File(s)                                                     | Change Summary                                                                                  |\n|-------------------------------------------------------------|------------------------------------------------------------------------------------------------|\n| examples/sglang/components/decode_worker_inc.py             | Added async decode worker component with request handler, worker/init functions, and service endpoint setup. |\n| examples/sglang/components/worker_inc.py                    | Added distributed worker component with request handler, metrics, disaggregation support, and async worker/init routines. |\n| examples/sglang/utils/sgl_utils.py                          | Added function to parse server arguments from a provided list, ensuring unique bootstrap port assignment. |\n\n## Poem\n\n> In burrows deep, new workers wake,  \n> To parse and serve, for language’s sake.  \n> Async they hop, with metrics in tow,  \n> Through loops of uv, their tokens flow.  \n> Arguments parsed, requests they greet—  \n> Distributed dreams, in code complete!  \n> 🐇✨\n\n<!-- walkthrough_end -->\n<!-- internal state start -->\n\n\n<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNwSPbABsvkCiQBHbGlcSHFcLzpIACIAMxJqLjRaelpZDDRmfAVmbnxyDFxkWPw+RCIvTCJoyAB3NGQHAWZ1Gno5MNgPbERKFERYTFpBjIwdBus7DEcBPoBGAHYAJjmNGC7IJUQGCnhucXyUDAYvbC30SAFdkliYAHkAETvILyRQ2vVYC8R4DAqPJi5SJtFA0ZhcaLMcqIbiiDQ1dyTdDwKFhbLJegPdKZbKAvIFIp+EiVEEEGLlSp/eGQABy6JS6ng+TQviUuG0XmQpUgJAAHtxKdQmRhkM4PLx8BJ4Ep6L9Oh4tjs9gcMGtEdxvL5/EEQoxMJcPFKSLUotRILBcLhuIgOAB6W1ET7YAQaQG27RgNIZLK2r0420any2xYrDRGfTGcBQMj0fC3NB4QikchUEGAtiFLi8fjCUTiKQyeRMJRUVTqLQ6CMmKBwVCofUJgjEMjKNMsDOcPxoWr2RzMZzyDrF5RlzTaXRgQyR0wGPmZAXSW0Uqq2vH5diIX2iRQkAD6tVKAGtKLvfgwNNxZBwDNFbwYLJAAIIASWbKeoUQcThc/FuDBGpCIEYj6QOQPZKMOdRHn0a4Ev0Ry4BQijYAwUQlHwuAbMufyQLEVBsAeFCHgANCgQIkB2pqilgDTpP+SEYPgPT2JQUqoZ0ZojLQkTIBBO5EjqiBFGqGxMP4jCVIgyAAAYPNuSg2IEwRCQAEkMkQUNJ8G/IyLLwAAXlEHyYegWDSRSGgAKJ/L8JBab8QmYOxQw8vy+C9NR6CIHRkDJq23JsJhiiQNJfmpnZdQIP+mzydIkAAFIAMp3DSAnKUUpF8ohaBiMgBDHlgz4PNR9CIPOrw4dwziZCQNAUIgpEub8Ej4MeeUbGQjrkAA5KK3nHLADFMcgYVCocgWwMFxlfEJ/iZL8RA8hkAiRLQIkeKNKpEg4XiEmKkCyPAxIyl5dGDfkw1ePIZKYQCLIaWGBiIoRx58LE2DHCqpF8eF9DTSFAACfpZPu0EUAAFI54gMAAvAAYiyvQAJTSaROniHphntQqby7AIeBRBQH3iGwS1SgxHaXPIVX1QtLEUFIfDOEQjgbqZ9AMPddMNv1DAhejWnvZ9wrrfzYy4ILH1iMKjBzTQnnmRUVk2eQ9kiuyxwkGj6uYBj8vymlupcRpjUYKVtXINg3DfKx8DsTGeS/KE3pRNEm0kDU01yrdMR8UoNSwewv4G9EwP4DULswjlJAiagDv4E79MFgbxuUL1IXu1pE3BbRA1DT0V2PYi2y7PsCgYLE8As/4yBePgRBdUQ2uOT42MhdgEh1/g3BaSQUiFC8+Dd7h3KwhQ6H9prpv0ETIoGy9MGlEx4jkJFZCuaIBP0BMPv9nKWS0N4MfrAC/E9HF0lQkQMKiIL3JJSlMXFtzZsISQRDhcg/0+1i3rZLQuN4D4xBLPEmHg8I1RemiHgSEpRKG+JzSoK0FSxSTnbGORhzCWEfLtVswo8rZB9hBSoqZ8HBz5HkCgIJR7OleHzdgjJpDhifCkKIJwGgyTksORSglcBqTNhpLS/0mDqyJmIbk0ldynnFlIyGxJYiZRViQLgFlrJdRICjdmp1ji+RbKmAKtVJr0FCnoj8civAKIEioxCmi5TSTnORTc2EiCrhYPiDcW5hygyIieM8F5ZDSQMFAR8rDt681wlLLa0kF4Q1ASiZRkAHiAOAXQGwxN4maJ+h+P6nxAZh28a9SG7Jobw0RhotWIUHELicRUFcgdCibj9nuGJ0jzyXkCcE0J2i+ZC2locaS6NwZxLYFwJJs0gFbzSYUeJpFegMxPMzG0kBEqsUoI+Cg19bFmSqTxJctS/iuNyOuBpnidwFN8ccfxgTbzRHDDOHZi5nGHPcSclpfjLzXhufebBr53alT7AOYO/4qhMIMCBMC1hZBBSwAfI+IU3mXPadpQoSFD6oVoGjciHZuabGSVvF4VRsBoFIJAA+xIoI+L4PU0IPQ6Y+0VpSIggt8LGiPFo6SYctLDOPoiMSAJJIyR4elfh3FKBZ0wMSuK2p0owPwKhKSC1SKBV2AwZAGoVpIAQH8ae/B9jCnuvIBw3BKF7VxWVBu/giBjRhfxaV8Bdg4TQE/fiJwjqFDWM+UI6N4CYziv9fUiUADiAAZJa6jZmrKZhs1mDSdVOupUSeI/gp4sUtHTK2FxlV2zVbQzVfQXLdxVAaokiEjpShwgIIeRRsrW0mkJAMpQvUV2yOhM1xKP7v2tUXDYqc+D43gF4WgjRyp0xpjVOqxQkLMCOICOm0qQgNTNSvFmeaZDUGinhFg9gFqRENkJRdLl3ZcjwBqPavNzqMQLrID1WAAHmo7VaraZK0ahF4CQGmcVK34GrVQa2Y7AqUBKixM2ooYHvqQvK3oM8lK6jJE6ppEk3XvAQLukRDBsAUGTREeQm06WtTIJOrdPtOq2VFsvU99hEIJGnagCUkGohkkOsdNEBUZ3+A7FyN6OkBhEgaPkRdxs6aVpMgIddXwXI/D+Lu+dQlSU7kQGsAAsrVFVopxLqteAMIyuSBTR0moOvoEgWTKR4DGdN1oqOZF7Maxt3aPCwt3UoSu5BPK5z5tEsGkthaHCySCCY8HsRZApa9UitQopfBph5JO+bo0ce1rpV4WMDYkdXi5alpFLVvEAwbMlvh45OwjYSDNPsZOhF7ZAcG/0EOuqDvAW45A6B0CRqRERldq5xSzaqrRTqABaimACKkBjyyDAEZ04Hg+5Bw03migOq5nJx9u7JatBHbupPvYJUZdz4yQ7l3HuI8+CTYHnt3VJMDLWq0bPNuMTukXsuvIML68+Sbz88gPeqosFPlwaQ/j0CiGiBIdarktwKGNqiDQjV9DpniFBVAAAwgK9AoSuDSSFSEEVgi6i5OzkOkKUj0ZSNRiFXouAra7k64gIn0kdgJBoOTlT2bdzTa0xQXc+XChU93FbWgH56eltVZz0guBdxfp/Wgbg0iSic/7YO3cZUgQLSZ9VKEgvaq7lKyLsTcuDJ2VIpIpAmvcD/nVzBoSqv3x09F7NcX6uh7MCp5nPXTOIPSEQHLyz9undvsrj4Xcm1SgVPsbycqjz9kuOpZuBFbSAlBKfOE3pW0MR0BRzEoZ6SRmJLxW0KZYDMnbl+ljky0kgaBfwOciGUM7alM5OUo4lTg+OL2Yy55xyii2ij1c2Pj54+RJlkn2gKPBnctGVn1J6etbRdZ4srgKz5kUHWZswPDyanN4j+3sGrTO9fKCfchv1Sm8rjwAOlfXPxCcn8Z8u8D4XxvlbJ+AFP44yMAAqC8FxoInebMpFvczjdyLNaUIv5l0phGaOyG1BcJpqEE/szDGqEOMn8LxPanmFdKRN/tjDRoUOiCFLPozAvpTkcI5Cmi5ARhhnFE6h9PADqCFHeu2pataiLlWtbhLiavZH1D8EQOQLKPVt+q5AKHbOoFdLKnAnQB6qEDTh+ArAvrAVYM4HMkTjXKsqBlbjWjwI2o1CkMgHDsGs+OgLFmzJutOtJDgWshspTqgbIXFD7L8BRjAZTJATqiIgAkWj4PIL8CILlAbAofMvQCaghIQhsN/tvHoQ0gaOhKfBgIzMutAkYZGngdJLHMUL3ocKgD8C0CQnXnXI6HzDdB1LyG8HTNJN/nLmHn/qYVpMArqnflyDREIdKKaMEaEJAV2LdBhCMLhLVP+NzA0YRtOk6vNuguXG1hhl2pgt8l9nVEDn9hsMQtVCqMDrwWDrGHwNNlDuEEdEBLHnDEkTRMjtQTcDwBYcUYyqUdfK0uDNPi8G8AANqzQAC6SMugegyysRZRdeQeIeK+h+Z+J+R+5+7SN4d4O+U41Yy2wcjYSYpibYrA7AXAVAPYX4/YP4Q4O4pYagY4lYk4BgIJ6Y6gp4Q6JuRoJotA7uzgoQEYWJUYlwAAnAAMwLAABsCwDAsQSwAAHFSQAKxLBUn0lzApAMAMCskJBMloBLBLACDLA0lKAkACAckcm0AAAsE4wJlJOJwu0obu/ghJdAbOr8+gQAA=== -->\n\n<!-- internal state end -->\n<!-- tips_start -->\n\n---\n\nThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.\n\n<details>\n<summary>❤️ Share</summary>\n\n- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)\n- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)\n- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)\n- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)\n\n</details>\n\n<details>\n<summary>🪧 Tips</summary>\n\n### Chat\n\nThere are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=ai-dynamo/dynamo&utm_content=1721):\n\n- Review comments: Directly reply to a review comment made by CodeRabbit. Example:\n  - `I pushed a fix in commit <commit_id>, please review it.`\n  - `Explain this complex logic.`\n  - `Open a follow-up GitHub issue for this discussion.`\n- Files and specific lines of code (under the \"Files changed\" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:\n  - `@coderabbitai explain this code block.`\n  -\t`@coderabbitai modularize this function.`\n- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:\n  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`\n  - `@coderabbitai read src/utils.ts and explain its main purpose.`\n  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`\n  - `@coderabbitai help me debug CodeRabbit configuration file.`\n\n### Support\n\nNeed help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.\n\nNote: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.\n\n### CodeRabbit Commands (Invoked using PR comments)\n\n- `@coderabbitai pause` to pause the reviews on a PR.\n- `@coderabbitai resume` to resume the paused reviews.\n- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.\n- `@coderabbitai full review` to do a full review from scratch and review all the files again.\n- `@coderabbitai summary` to regenerate the summary of the PR.\n- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.\n- `@coderabbitai resolve` resolve all the CodeRabbit review comments.\n- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.\n- `@coderabbitai help` to get help.\n\n### Other keywords and placeholders\n\n- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.\n- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.\n- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.\n\n### Documentation and Community\n\n- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.\n- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.\n- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.\n\n</details>\n\n<!-- tips_end -->"},"request":{}},"response":{"url":"https://api.github.com/repos/ai-dynamo/dynamo/issues/comments/3025614230","status":404,"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","content-encoding":"gzip","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Tue, 01 Jul 2025 21:40:26 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"github.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","transfer-encoding":"chunked","vary":"Accept-Encoding, Accept, X-Requested-With","x-accepted-github-permissions":"issues=write; pull_requests=write","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-api-version-selected":"2022-11-28","x-github-media-type":"github.v3; format=json","x-github-request-id":"700C:2984D6:A8114:151C4A:686455CA","x-ratelimit-limit":"15000","x-ratelimit-remaining":"14908","x-ratelimit-reset":"1751409250","x-ratelimit-resource":"core","x-ratelimit-used":"92","x-xss-protection":"0"},"data":{"message":"Not Found","documentation_url":"https://docs.github.com/rest/issues/comments#update-an-issue-comment","status":"404"}}}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

examples/sglang/utils/sgl_utils.py (1)

44-54: LGTM! Clean utility function implementation.

The function follows the same pattern as the existing parse_sglang_args function and correctly handles argument parsing with bootstrap port injection. Consider adding a docstring to document the purpose and parameters for consistency with coding standards.
examples/sglang/components/worker_inc.py (3)
131-132: Use cryptographically secure random for bootstrap room generation.

The current implementation uses random.randint() which is not cryptographically secure. For distributed systems communication, consider using a more secure random generator.

Apply this diff to use secure random:
+import secrets
+
 def _generate_bootstrap_room(self):
-    return random.randint(0, 2**63 - 1)
+    return secrets.randbits(63)
77-92: Document the temporary nature of placeholder metrics more clearly.

The placeholder metrics implementation should be more clearly documented as temporary to prevent confusion.

Consider updating the warning message to be more explicit:
 logging.warning(
-    "Publishing placeholder metrics in SGLangWorker; these are NOT real engine metrics yet and will be replaced once upstream support lands."
+    "TEMPORARY: Publishing random placeholder metrics. These will be replaced with real engine metrics once sgl-project/sglang#6721 is merged. Do not rely on these values for monitoring or scaling decisions."
 )
226-226: Fix formatting: remove whitespace before colon.

Apply this diff to fix the formatting issue:
-                new_tokens = data["output_ids"][num_output_tokens_so_far[index] :]
+                new_tokens = data["output_ids"][num_output_tokens_so_far[index]:]

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4676f8 and b93767c.

📒 Files selected for processing (3)

examples/sglang/components/decode_worker_inc.py (1 hunks)
examples/sglang/components/worker_inc.py (1 hunks)
examples/sglang/utils/sgl_utils.py (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: t-ob
PR: ai-dynamo/dynamo#1290
File: launch/dynamo-run/src/subprocess/sglang_inc.py:80-110
Timestamp: 2025-06-03T10:17:51.711Z
Learning: The sglang `async_encode` method does not support streaming options, so collecting all embeddings before yielding is the correct approach for embedding requests.

examples/sglang/components/decode_worker_inc.py (1)

Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.916Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.

examples/sglang/components/worker_inc.py (2)

Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.916Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.

🪛 Pylint (3.3.7)

examples/sglang/components/decode_worker_inc.py

[refactor] 34-34: Too few public methods (1/2)

(R0903)

[error] 79-79: No value for argument 'runtime' in function call

(E1120)

examples/sglang/components/worker_inc.py

[error] 298-298: No value for argument 'runtime' in function call

(E1120)

🪛 Flake8 (7.2.0)

examples/sglang/components/worker_inc.py

[error] 226-226: whitespace before ':'

(E203)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Test - vllm

🔇 Additional comments (2)

examples/sglang/components/decode_worker_inc.py (1)

57-79: Well-structured worker implementation.

The worker function and initialization logic are correctly implemented. The static analysis hint about missing runtime argument is a false positive - the @dynamo_worker decorator automatically injects the runtime parameter as confirmed by the retrieved learnings.

examples/sglang/components/worker_inc.py (1)

249-298: Well-architected worker implementation with proper component separation.

The worker initialization and main execution logic are well-structured. The static analysis hint about missing runtime argument is a false positive - the @dynamo_worker decorator automatically injects the runtime parameter based on retrieved learnings. The separation between prefill and decode workers provides good modularity for disaggregated inference.

examples/sglang/components/decode_worker_inc.py

examples/sglang/components/worker_inc.py

alec-flowers · 2025-07-02T15:49:32Z

It's SO MUCH simpler though. Now people reading the example can actually understand what's going on.

Yeah, I agree it's simpler. But would it be better to let all components to read config from a centralized file and integrate auto GPU selection? Now if users want to change the number of P/D workers or some settings, there are quite some changes need to made in the shell script.

Yea but if you want this then use dynamo deploy. We are separating concerns here. If you want full deployment help, then use the right tool. Here we are showing examples of how to run using dynamo serve, where yes, the user has to necessarily be more involved.

At the dynamo serve level, the contract is the user must manage the processes. At this level we can't have resource management and stuff. What if the user wants to put dynamo serve into a slurm script and now they have to think about srun resource management and the implicit built in resource management. That is too confusing.

Its a purposeful regression. We are taking away features and letting users have more control in the way that they are accustomed too. Any user who has used SSH and GPUs can manage these examples. If they want full service deployment where they fill out a config and one click deploy then they can move to dynamo deploy where under the hood it will basically just do this work for them.

The regression you are seeing is because dynamo deploy isn't built up to have the features yet.

tedzhouhk · 2025-07-02T15:56:20Z

Yea but if you want this then use dynamo deploy. We are separating concerns here. If you want full deployment help, then use the right tool. Here we are showing examples of how to run using dynamo serve, where yes, the user has to necessarily be more involved.

At the dynamo serve level, the contract is the user must manage the processes. At this level we can't have resource management and stuff. What if the user wants to put dynamo serve into a slurm script and now they have to think about srun resource management and the implicit built in resource management. That is too confusing.

Its a purposeful regression. We are taking away features and letting users have more control in the way that they are accustomed too. Any user who has used SSH and GPUs can manage these examples. If they want full service deployment where they fill out a config and one click deploy then they can move to dynamo deploy where under the hood it will basically just do this work for them.

I see. Does dynamo deploy only works for k8s or it also works for bare metal?

alec-flowers · 2025-07-02T16:01:15Z

Yea but if you want this then use dynamo deploy. We are separating concerns here. If you want full deployment help, then use the right tool. Here we are showing examples of how to run using dynamo serve, where yes, the user has to necessarily be more involved.
At the dynamo serve level, the contract is the user must manage the processes. At this level we can't have resource management and stuff. What if the user wants to put dynamo serve into a slurm script and now they have to think about srun resource management and the implicit built in resource management. That is too confusing.
Its a purposeful regression. We are taking away features and letting users have more control in the way that they are accustomed too. Any user who has used SSH and GPUs can manage these examples. If they want full service deployment where they fill out a config and one click deploy then they can move to dynamo deploy where under the hood it will basically just do this work for them.

I see. Does dynamo deploy only works for k8s or it also works for bare metal?

At first I was against it working for bare metal. My opinion is we should be focused on K8s and that is where our effort should go. For bare metal we can let people manage it with scripts.

However, I was thinking about this last night and I think I would be ok with doing it for bare metal under these conditions:

It truly is a one run command even for multi-node -> This means setting up something like Ray where you start the server on the head node and then run processes on the other nodes. So you would basically have to "setup" your cluster and then you could run your one command on any node and it would just deploy.
The config file to deploy on K8s / bare metal / slurm should be as close to the same as possible. Preferably with no changes.

At that point why not just set up a minkube cluster on your bare metal though... In the end that prioritization of work will be set by Maksim and team.

ishandhanani · 2025-07-02T17:59:29Z

@tedzhouhk - this now allows us to get CI properly wired up without having to work across dynamo serve/local orchestration barriers. A python script is much easier to add in (and tear down)

…/sgl-v2

container/Dockerfile.sglang

container/Dockerfile.sglang-deepep

init

a0181b8

pull-request-size bot added the size/L label Jul 1, 2025

github-actions bot added the feat label Jul 1, 2025

ishandhanani added 5 commits July 1, 2025 12:11

pc

10c604e

fixes

de7e18c

aggregated works

5eb5905

working disagg

fb36464

zmq

346aab2

ishandhanani marked this pull request as ready for review July 1, 2025 21:34

ishandhanani requested review from biswapanda, tedzhouhk, rmccorm4, alec-flowers and grahamking as code owners July 1, 2025 21:34

ishandhanani added 2 commits July 1, 2025 14:35

go

9731a29

go

b93767c

ai-dynamo deleted a comment from coderabbitai bot Jul 1, 2025

coderabbitai bot reviewed Jul 1, 2025

View reviewed changes

examples/sglang/components/decode_worker_inc.py Outdated Show resolved Hide resolved

examples/sglang/components/worker_inc.py Outdated Show resolved Hide resolved

examples/sglang/components/worker_inc.py Outdated Show resolved Hide resolved

ishandhanani added 2 commits July 1, 2025 21:41

prints

43b148f

launchers

f98af51

pull-request-size bot added size/XL and removed size/L labels Jul 1, 2025

go

e4125b9

ishandhanani requested review from tanmayv25, richardhuo-nv, ptarasiewiczNV, nnshah1 and a team as code owners July 1, 2025 23:27

ishandhanani added 5 commits July 2, 2025 19:22

ease

7e4f020

go

03fd0a2

rename

b6d6b69

help

63856eb

go

e6f660b

tedzhouhk approved these changes Jul 2, 2025

View reviewed changes

ishandhanani added 3 commits July 2, 2025 20:21

go

c11b9a4

Merge branch 'ishan/sgl-v2' of github.com:ai-dynamo/dynamo into ishan…

2667cb2

…/sgl-v2

go

b8a1ba1

ishandhanani enabled auto-merge (squash) July 2, 2025 20:26

whoisj requested changes Jul 2, 2025

View reviewed changes

container/Dockerfile.sglang Outdated Show resolved Hide resolved

container/Dockerfile.sglang-deepep Outdated Show resolved Hide resolved

nv-tusharma approved these changes Jul 2, 2025

View reviewed changes

ishandhanani added 2 commits July 2, 2025 20:47

go

01321e6

bump

f3c817c

whoisj approved these changes Jul 2, 2025

View reviewed changes

ci

711e61c

ishandhanani merged commit 9cbf803 into main Jul 2, 2025
6 checks passed

ishandhanani deleted the ishan/sgl-v2 branch July 2, 2025 22:13

fsaady mentioned this pull request Jul 3, 2025

feat: automate slurm handling in sglang example. #1730

Merged

ishandhanani mentioned this pull request Jul 7, 2025

feat: receive kvmetrics from sglang scheduler #1789

Merged

coderabbitai bot mentioned this pull request Jul 7, 2025

feat: add flush_cache endpoint to sglang #1769

Merged

2 tasks

atchernych pushed a commit that referenced this pull request Jul 9, 2025

feat: add dynamo components for sglang (#1721)

4dda98a

coderabbitai bot mentioned this pull request Jul 22, 2025

chore(sglang): Move examples/sglang to components/backends/sglang #2046

Merged

coderabbitai bot mentioned this pull request Jul 30, 2025

feat: reorganize sglang and add expert distribution endpoints #2181

Merged

This was referenced Aug 11, 2025

docs: add sglang installation variants #2400

Merged

feat(sglang): unify entry point for SGLang backend architecture #2493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add dynamo components for sglang #1721

feat: add dynamo components for sglang #1721

Uh oh!

ishandhanani commented Jul 1, 2025 •

edited by tedzhouhk

Loading

Uh oh!

copy-pr-bot bot commented Jul 1, 2025

Uh oh!

coderabbitai bot commented Jul 1, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alec-flowers commented Jul 2, 2025 •

edited

Loading

Uh oh!

tedzhouhk commented Jul 2, 2025

Uh oh!

alec-flowers commented Jul 2, 2025 •

edited

Loading

Uh oh!

ishandhanani commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: add dynamo components for sglang #1721

feat: add dynamo components for sglang #1721

Uh oh!

Conversation

ishandhanani commented Jul 1, 2025 • edited by tedzhouhk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

TODO

Changes

Examples

Tests

Uh oh!

copy-pr-bot bot commented Jul 1, 2025

Uh oh!

coderabbitai bot commented Jul 1, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alec-flowers commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tedzhouhk commented Jul 2, 2025

Uh oh!

alec-flowers commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ishandhanani commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ishandhanani commented Jul 1, 2025 •

edited by tedzhouhk

Loading

alec-flowers commented Jul 2, 2025 •

edited

Loading

alec-flowers commented Jul 2, 2025 •

edited

Loading