Skip to content

[FEATURE]: Support SGLANG in Dynamo Planner #1196

@yicwang

Description

@yicwang

Feature request

By looking at the code, Dynamo Planner has a well abstracted interfaces which would collect the metrics from different inference framework backends. The server side is implemented in metrics_aggregator.rs, and the client side will use its Python bindings to publish the metrics. The key part in the current VLLM implementation is below:

self.metrics_publisher.publish(
                            metrics.request_active_slots,
                            metrics.request_total_slots,
                            metrics.kv_active_blocks,
                            metrics.kv_total_blocks,
                            metrics.num_requests_waiting, 
                            metrics.gpu_cache_usage_perc, 
                            metrics.gpu_prefix_cache_hit_rate)

In today's Dynamo repo, the features are being maintained by Dynamo community as a huge patch (container/deps/vllm/vllm_v0.8.4-dynamo-kv-disagg-patch.patch). To me this is not a good idea as it is so hard to maintain if not merged to VLLM repo. But I do understand the concern that as a inference framework, probably it is not a good idea to accept code that is intrusive too much. Same between SGLANG and VLLM.

We would really want to contribute to fix the missing piece to make Planner run on SGLANG, hence I want to start the thread here to discuss and explore some ideas in the community. Some options I can think of:

  1. Put Dynamo into SGLANG. Implement a new class say "DynamoPlannerMetrics" in SGLANG, and we will initialize the instance and call the corresponding APIs to collect the metrics in multiple places, and eventually send them out using the Dynamo API. This is similar to how VLLM is being supported, but we will keep in mind to have the minimum intrusion.

  2. Use the existing SGLANG's metrics interface, enhance it to match the Dynamo Planner requirements. Adding features in Dynamo planner to use the standard SGLANG interface to collect the metrics.

I personally prefer option 2, and want to know how the community is thinking for the roadmap.

Thanks!

Describe the problem you're encountering

Support SGLANG in Dynamo Planner

Describe alternatives you've tried

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions