-
Notifications
You must be signed in to change notification settings - Fork 585
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Feature request
Would be nice to have a mock worker API without actually using a GPU, so one could do at-scale testings without having at-scale GPUs.
- Need to simulate / predict the KvEvents based on some eviction policy for the worker (feat: vllm mock workers, Rusty skeleton #1033)
- Model the memory + compute complexity based on the active radix cache size and the active request sizes (feat: Add TTFT and ITL Interpolation to Profiling Script #1159), done by @tedzhouhk
- Integrate the above modeling into the mocker. And simulate more closely the vllm scheduling behaviors. Hook it up as an
AsyncEngine
, or a component that can be registered as a dynamo endpoint. (feat: vllm mocker enhancement #1236 ) - Publishing of
RouterEvents
andForwardPassMetrics
over NATS, reusing the existing publishers when possible. - Verify functionality and perform preliminary (sanity check) benchmarking on the KV router.
Describe the problem you're encountering
N/A
Describe alternatives you've tried
A mock worker exists here
https://github.com/ai-dynamo/dynamo/blob/main/components/metrics/src/bin/mock_worker.rs
but seems to be generating random metrics for the purpose of testing metrics handling, and does not seem to perform any modeling
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request