Remove cuda graph batch size adjustment for dp attention #2484

ispobock · 2024-12-14T15:04:13Z

Motivation

Currently, the cuda graph can take less cuda memory for Triton attention backend. So 128 cuda graph batch size works fine for dp attention.

…#2484)

remove cuda graph bs adjust

42ea9d0

ispobock requested review from merrymercy, Ying1123, hnyls2002, zhyncs and ByronHsu as code owners December 14, 2024 15:04

ispobock enabled auto-merge (squash) December 14, 2024 15:16

zhyncs approved these changes Dec 14, 2024

View reviewed changes

ispobock merged commit 0ba2c58 into sgl-project:main Dec 14, 2024
15 checks passed

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Remove cuda graph batch size adjustment for dp attention (sgl-project…

559f535

…#2484)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove cuda graph batch size adjustment for dp attention #2484

Remove cuda graph batch size adjustment for dp attention #2484

Uh oh!

ispobock commented Dec 14, 2024

Uh oh!

Uh oh!

Uh oh!

Remove cuda graph batch size adjustment for dp attention #2484

Remove cuda graph batch size adjustment for dp attention #2484

Uh oh!

Conversation

ispobock commented Dec 14, 2024

Motivation

Uh oh!

Uh oh!

Uh oh!