-
-
Notifications
You must be signed in to change notification settings - Fork 10k
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small #21217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request deprecates and removes the BlockSparse Attention feature and the Phi3-Small model, which relied on it. The changes are extensive, touching many files across the attention backends, model registry, and testing infrastructure. My review confirms that the removal is clean and consistent. All references to blocksparse_params
, the block-sparse attention implementation, and the Phi3SmallForCausalLM
model have been correctly eliminated. The related tests and documentation have also been updated accordingly. The changes look good to me.
Kernels test failure is related to this PR |
Upstream PR vllm-project/vllm#21217 changed attention APIs. This PR adjusts our attention implementation to the new API. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: x22x22 <wadeking@qq.com>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Upstream PR vllm-project/vllm#21217 changed attention APIs. This PR adjusts our attention implementation to the new API. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: Thomas Atta-fosu <tattafosu@habana.ai>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
This PR removes the block sparse attention and the support for phi3-small which uses the attention.