update sgl-kernel for EP: kernel part #8514

ch-wan · 2025-07-29T08:05:15Z

Motivation

In EP, we set the expert ids for filtered experts as -1. We update sgl-kernel to handle this case.

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Summary of Changes

Hello @ch-wan, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a work-in-progress (WIP) aimed at updating the Mixture of Experts (MoE) kernel, specifically focusing on the moe_align_block_size function and its underlying implementations. The core changes involve adjusting array indexing and buffer sizing in the Python, C++, and CUDA components to resolve potential off-by-one errors or to align with a revised indexing scheme for expert IDs, which is critical for accurate token routing and padding in MoE operations.

Highlights

MoE Kernel Indexing Adjustments: I've made consistent modifications across the Python, C++, and CUDA kernels to adjust array indices (e.g., topk_ids[i] + 1, e - 1, left - 2) when processing expert IDs and token counts within the moe_align_block_size logic. This likely addresses off-by-one errors or aligns with a new indexing scheme.
Buffer Size Correction: The cumsum_buffer in the Python layer (fused_moe.py) has been expanded from (num_experts + 1,) to (num_experts + 2,), indicating a need for an additional slot to correctly handle cumulative sums in the MoE alignment process.
Parameter Alignment for Native Kernels: The num_experts parameter passed from the Python moe_align_block_size function to the underlying sgl_moe_align_block_size kernel is now num_experts + 1. This suggests a change in how the number of experts is interpreted or indexed by the native C++/CUDA implementations.
Debugging Aid: A print(num_experts) statement was temporarily added in the Python moe_align_block_size function, likely for debugging purposes during the development of these changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the Mixture-of-Experts (MoE) kernels for expert parallelism (EP). The changes consistently apply a +1 shift to expert IDs across Python, C++, and CUDA files, likely to switch to a 1-based indexing scheme internally. The logic appears consistent and correct within the scope of this change.

I've identified a leftover debug print statement in a Python file, which I've marked as high as it could cause issues in a production environment. Please ensure this is removed. Additionally, I've raised concerns about potential out-of-bounds access in C++ and CUDA code due to the -1 and -2 adjustments to expert IDs, which need careful review.

BBuf · 2025-07-30T01:49:07Z

python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py

@@ -579,7 +579,7 @@ def moe_align_block_size(
    num_tokens_post_pad = torch.empty((1), dtype=torch.int32, device=topk_ids.device)

    cumsum_buffer = torch.empty(


Can we add a comment here ?

BBuf

LGTM.

…v/ep-kernel

…buffer (#8526) Co-authored-by: Ke Bao <ispobaoke@gmail.com>

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

update sgl-kernel for ep

bc73b8e

ch-wan requested review from zhyncs, ispobock, HandH1998, BBuf, yizhang2077, merrymercy, FlamingoPg, HaiShaw, Ying1123 and kushanam as code owners July 29, 2025 08:05

gemini-code-assist bot reviewed Jul 29, 2025

View reviewed changes

fix

5576959

gemini-code-assist bot reviewed Jul 29, 2025

View reviewed changes

Merge branch 'main' into cheng/dev/ep-kernel

8ad5493

ch-wan changed the title ~~wip update sgl-kernel for ep~~ Update sgl-kernel for ep Jul 29, 2025

BBuf reviewed Jul 30, 2025

View reviewed changes

BBuf approved these changes Jul 30, 2025

View reviewed changes

kernel part

b521465

ch-wan changed the title ~~Update sgl-kernel for ep~~ update sgl-kernel for EP: kernel part Jul 30, 2025

ch-wan mentioned this pull request Jul 30, 2025

update sgl-kernel for EP: python part #8550

Merged

6 tasks

Merge commit 'e3f08c77bc8ec4bf78501305a5aa15a779ad9ff2' into cheng/de…

0c6a7d9

…v/ep-kernel

ch-wan mentioned this pull request Jul 31, 2025

[Feature] Hybrid EP and TP #8590

Merged

6 tasks

ch-wan and others added 4 commits July 30, 2025 18:43

revert cpu change

d17811a

Merge branch 'main' into cheng/dev/ep-kernel

36a64b4

update ut

7d37806

[sgl-kernel code style] clean moe_align_block_size kernel token_cnts_…

6e953d0

…buffer (#8526) Co-authored-by: Ke Bao <ispobaoke@gmail.com>

zhyncs requested a review from hnyls2002 as a code owner July 31, 2025 04:37

zhyncs added 2 commits July 30, 2025 21:38

Merge branch 'main' into cheng/dev/ep-kernel

0cd835c

upd

7506fbe

upd

cf377a3

zhyncs merged commit a5f5ab4 into main Jul 31, 2025
33 of 52 checks passed

zhyncs deleted the cheng/dev/ep-kernel branch July 31, 2025 05:19

huangzhilin-hzl pushed a commit to huangzhilin-hzl/sglang that referenced this pull request Aug 1, 2025

update sgl-kernel for EP: kernel part (sgl-project#8514)

aaef125

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Aug 1, 2025

update sgl-kernel for EP: kernel part (sgl-project#8514)

3b0f5d4

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

TianQiLin666666 pushed a commit to TianQiLin666666/sglang that referenced this pull request Aug 1, 2025

update sgl-kernel for EP: kernel part (sgl-project#8514)

8f2b18a

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

lifuhuang pushed a commit that referenced this pull request Aug 3, 2025

update sgl-kernel for EP: kernel part (#8514)

8821aa0

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025

update sgl-kernel for EP: kernel part (#8514)

43c9dee

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025

update sgl-kernel for EP: kernel part (#8514)

64eab79

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

update sgl-kernel for EP: kernel part (sgl-project#8514)

d9f2ce5

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 18, 2025

update sgl-kernel for EP: kernel part (sgl-project#8514)

bc82763

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update sgl-kernel for EP: kernel part #8514

update sgl-kernel for EP: kernel part #8514

Uh oh!

ch-wan commented Jul 29, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

BBuf Jul 30, 2025

Uh oh!

BBuf left a comment

Uh oh!

Uh oh!

Uh oh!

		@@ -579,7 +579,7 @@ def moe_align_block_size(
		num_tokens_post_pad = torch.empty((1), dtype=torch.int32, device=topk_ids.device)

		cumsum_buffer = torch.empty(

update sgl-kernel for EP: kernel part #8514

update sgl-kernel for EP: kernel part #8514

Uh oh!

Conversation

ch-wan commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

BBuf Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

BBuf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ch-wan commented Jul 29, 2025 •

edited

Loading