[FEATURE]: a mock worker API

### Feature request

Would be nice to have a mock worker API without actually using a GPU, so one could do at-scale testings without having at-scale GPUs. 

- [x] Need to simulate / predict the KvEvents based on some [eviction policy ](https://docs.vllm.ai/en/latest/design/automatic_prefix_caching.html) for the worker (#1033)
- [x] Model the memory + compute complexity based on the active radix cache size and the active request sizes (#1159), done by @tedzhouhk 
- [x] Integrate the above modeling into the mocker. And simulate more closely the vllm scheduling behaviors. Hook it up as an `AsyncEngine`, or a component that can be registered as a dynamo endpoint. (#1236 )
- [x] Publishing of `RouterEvents` and `ForwardPassMetrics` over NATS, reusing the existing publishers when possible.
- [x] Verify functionality and perform preliminary (sanity check) benchmarking on the KV router.

### Describe the problem you're encountering

N/A

### Describe alternatives you've tried

A mock worker exists here 
https://github.com/ai-dynamo/dynamo/blob/main/components/metrics/src/bin/mock_worker.rs
but seems to be generating random metrics for the purpose of testing metrics handling, and does not seem to perform any modeling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE]: a mock worker API #995

Feature request

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE]: a mock worker API #995

Description

Feature request

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions