-
Notifications
You must be signed in to change notification settings - Fork 588
feat: update active blocks in chunks only when necessary #1848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes refactor the token handling pipeline to support batch token pushes instead of single-token pushes. Methods in the router, scheduler, and sequence management layers are updated to accept and process slices of tokens, with corresponding updates to block management logic and related enum variants. Tests and multi-worker infrastructure are also adapted. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant KvRouter
participant KvScheduler
participant ActiveSequences
Client->>KvRouter: push(request_id, &[tokens])
KvRouter->>KvScheduler: push(request_id, &[tokens])
KvScheduler->>ActiveSequences: push(request_id, &[tokens])
ActiveSequences-->>KvScheduler: update blocks, manage sequences
KvScheduler-->>KvRouter: ack
KvRouter-->>Client: ack
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (3)
🧰 Additional context used🧠 Learnings (2)📓 Common learnings
lib/llm/src/kv_router.rs (4)
🧬 Code Graph Analysis (1)lib/llm/src/kv_router.rs (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Overview:
Only update the active blocks when
(isl + current osl - 1) / block_size
changes, meaning that a generation block is committed and a new generation block is allocated
Benchmarks
not really sure what happened at hit rate 0.7, but generally looks good, particularly pure load balancer

Summary by CodeRabbit
New Features
Bug Fixes
Tests