feat: support flashinfer mla with prefix cache #3643

zhyncs · 2025-02-17T17:05:08Z

Motivation

This version should only use --enable-flashinfer-mla.

For other LLM engines, if you refer to this PR, please include "Adapted from https://github.com/sgl-project/sglang/pull/3643/files", thank you :-)

ref #3550

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

zhyncs · 2025-02-17T17:39:43Z

ref https://github.com/flashinfer-ai/flashinfer/actions/runs/13375889874/job/37354949525

jsun-012 · 2025-02-18T07:27:37Z

python/sglang/srt/layers/attention/flashinfer_backend.py

@@ -1004,6 +1055,26 @@ def call_begin_forward(
                custom_mask=custom_mask,
                non_blocking=True,
            )
+        elif (
+            global_config.enable_flashinfer_mla
+            and not global_server_args_dict["disable_radix_cache"]


@zhyncs A quick question, I feel if I want to make MTP work with flashinfer backend. During the target verify stage, it is this code block has to be run? because perfill increment computation has to use the absorb trick with flashinfer?

The full paged version is a temporary solution. I will soon support ragged prefill + paged prefill + paged decoding. These days have been quite busy, so I haven't updated.

zhyncs added 5 commits February 17, 2025 16:19

upd

af5cb7d

upd

f02d937

upd

727b461

upd

d94ba68

upd

3a3db1d

zhyncs requested review from merrymercy, Ying1123, hnyls2002, ispobock and ByronHsu as code owners February 17, 2025 17:05

zhyncs self-assigned this Feb 17, 2025

zhyncs added the high priority label Feb 17, 2025

zhyncs merged commit 714f3e6 into main Feb 17, 2025
21 of 22 checks passed

zhyncs deleted the zhyncs/prefix branch February 17, 2025 18:06

zhyncs mentioned this pull request Feb 17, 2025

chore: bump v0.4.3.post2 #3645

Merged

6 tasks

jsun-012 reviewed Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support flashinfer mla with prefix cache #3643

feat: support flashinfer mla with prefix cache #3643

Uh oh!

zhyncs commented Feb 17, 2025 •

edited

Loading

Uh oh!

zhyncs commented Feb 17, 2025

Uh oh!

Uh oh!

jsun-012 Feb 18, 2025

Uh oh!

zhyncs Feb 20, 2025

Uh oh!

Uh oh!

feat: support flashinfer mla with prefix cache #3643

feat: support flashinfer mla with prefix cache #3643

Uh oh!

Conversation

zhyncs commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

zhyncs commented Feb 17, 2025

Uh oh!

Uh oh!

jsun-012 Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

zhyncs Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhyncs commented Feb 17, 2025 •

edited

Loading