-
Notifications
You must be signed in to change notification settings - Fork 120
feat: Enable vLLM cudagraphs #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
de91c35
to
e84ff82
Compare
@jiemingz can you also add timing plot to the MR description showing benefits of enabling cuda graphs vs not. |
Unit test failure here with the eager key missing: @jiemingz |
3c08662
to
7bb9f3d
Compare
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
2610899
to
e545a48
Compare
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Addresses: !186
The generation throughput shows about ~3% speedup for llama8b on 4 nodes
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Additional Information