-
Notifications
You must be signed in to change notification settings - Fork 586
Description
Feature request
By looking at the code, Dynamo Planner has a well abstracted interfaces which would collect the metrics from different inference framework backends. The server side is implemented in metrics_aggregator.rs, and the client side will use its Python bindings to publish the metrics. The key part in the current VLLM implementation is below:
self.metrics_publisher.publish(
metrics.request_active_slots,
metrics.request_total_slots,
metrics.kv_active_blocks,
metrics.kv_total_blocks,
metrics.num_requests_waiting,
metrics.gpu_cache_usage_perc,
metrics.gpu_prefix_cache_hit_rate)
In today's Dynamo repo, the features are being maintained by Dynamo community as a huge patch (container/deps/vllm/vllm_v0.8.4-dynamo-kv-disagg-patch.patch). To me this is not a good idea as it is so hard to maintain if not merged to VLLM repo. But I do understand the concern that as a inference framework, probably it is not a good idea to accept code that is intrusive too much. Same between SGLANG and VLLM.
We would really want to contribute to fix the missing piece to make Planner run on SGLANG, hence I want to start the thread here to discuss and explore some ideas in the community. Some options I can think of:
-
Put Dynamo into SGLANG. Implement a new class say "DynamoPlannerMetrics" in SGLANG, and we will initialize the instance and call the corresponding APIs to collect the metrics in multiple places, and eventually send them out using the Dynamo API. This is similar to how VLLM is being supported, but we will keep in mind to have the minimum intrusion.
-
Use the existing SGLANG's metrics interface, enhance it to match the Dynamo Planner requirements. Adding features in Dynamo planner to use the standard SGLANG interface to collect the metrics.
I personally prefer option 2, and want to know how the community is thinking for the roadmap.
Thanks!
Describe the problem you're encountering
Support SGLANG in Dynamo Planner
Describe alternatives you've tried
No response