-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[Core] Minimize number of dict lookup in _maybe_evict_cached_block #21281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request optimizes _maybe_evict_cached_block
by reducing dictionary lookups, which was identified as a performance bottleneck. The changes are logically sound, achieving the intended performance improvement by using .get()
and .pop()
to minimize lookups. The new implementation is also more robust by handling cases where a block ID might not be present in the cache, avoiding potential KeyError
exceptions. The code is correct under the assumption of single-threaded access, which is consistent with the vLLM scheduler's design. Overall, this is a solid improvement.
Resolve #21141 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, otherwise lgtm
vllm/v1/core/block_pool.py
Outdated
if len(blocks_by_id) == 0: | ||
self.cached_block_hash_to_block.pop(block_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not del?
if len(blocks_by_id) == 0: | |
self.cached_block_hash_to_block.pop(block_hash) | |
if blocks_by_id: | |
del cached_block_hash_to_block.pop[block_hash] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For del, it makes a lot sense, as we don't rely on the returned values;
- But for the bool check, I thought
len(blocks_by_id) == 0
would be alightly more efficient? I could imagine the bool stack would be bool -> len(blocks_by_id) == 0. So maybe the original approach would avoid one less function call.
However, the difference of the second bullet might be very minimum, so updating the commit per you comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the bool check is more a style thing, only a suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good. Only two comments to address.
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
return False | ||
block.reset_hash() | ||
blocks_by_id.pop(block.block_id, None) | ||
if blocks_by_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong, fixed in #21357
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about this :( had meant to suggest not blocks_by_id
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting this didn’t trigger any unit test failure 🤣? I think we should fix that @Jialin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take some blame as well, didn't notice the issue in the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, thx for adding the tests.
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: qizixi <qizixi@meta.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: shuw <shuw@nvidia.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: x22x22 <wadeking@qq.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…llm-project#21281) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
With #21222, we noticed that _maybe_evict_cached_block became the new bottleneck for allocate_new_block.

And we notice that, we do duplicated key lookup against the same dict 4 times, and we could easily cut it down into 2.
Test Plan
Compare trace with and without the change on top of #21222
Test Result
Quite significant improvements 0.13ms -> 0.03ms
After

Before

(Optional) Documentation Update