Skip to content

Conversation

ispobock
Copy link
Collaborator

@ispobock ispobock commented Jun 22, 2025

Motivation

Fuse sorted_token_ids padding to moe_align_block_size. In some shapes, it can perform better, so make it optional.

Benchmark

     num_tokens  num_experts  topk         SGL  SGL Fusion       Triton
0           1.0          8.0   1.0   19.088000   17.216001    53.504001
1           1.0          8.0   2.0   19.072000   17.535999    32.768000
2           1.0          8.0   4.0   19.152001   17.503999    31.872001
3           1.0          8.0   8.0   20.032000   17.792000    31.840000
4           1.0         32.0   1.0   22.175999   19.584000    30.128000
5           1.0         32.0   2.0   22.175999   19.648001    39.391998
6           1.0         32.0   4.0   22.560000   19.808000    36.880001
7           1.0         32.0   8.0   22.143999   20.032000    29.440001
8           1.0         64.0   1.0   26.528001   23.903999    32.928001
9           1.0         64.0   2.0   26.559999   23.952000    34.111999
10          1.0         64.0   4.0   26.591999   24.032000    31.744000
11          1.0         64.0   8.0   26.464000   24.160000    40.592000
12          1.0        128.0   1.0   26.144000   23.328001    36.192000
13          1.0        128.0   2.0   26.144000   23.328001    36.224000
14          1.0        128.0   4.0   26.144000   23.328001    36.224000
15          1.0        128.0   8.0   26.112000   23.456000    36.320001
16          1.0        256.0   1.0   30.144000   27.488001    52.320000
17          1.0        256.0   2.0   30.080000   27.488001    52.288000
18          1.0        256.0   4.0   30.112000   27.424000    52.576002
19          1.0        256.0   8.0   30.112000   27.551999    52.416001
20          8.0          8.0   1.0   20.032000   17.824000    37.440000
21          8.0          8.0   2.0   20.160001   17.920000    41.120000
22          8.0          8.0   4.0   18.719999   17.999999    39.264001
23          8.0          8.0   8.0   20.447999   18.464001    40.064000
24          8.0         32.0   1.0   22.143999   20.064000    30.080000
25          8.0         32.0   2.0   22.431999   20.640001    30.832000
26          8.0         32.0   4.0   22.048000   21.215999    31.647999
27          8.0         32.0   8.0   22.496000   22.368001    30.784000
28          8.0         64.0   1.0   26.464000   24.160000    38.047999
29          8.0         64.0   2.0   26.432000   24.576001    32.095999
30          8.0         64.0   4.0   26.800000   25.024001    32.320000
31          8.0         64.0   8.0   26.432000   25.696000    37.535999
32          8.0        128.0   1.0   26.144000   23.488000    42.879999
33          8.0        128.0   2.0   26.112000   23.488000    36.416002
34          8.0        128.0   4.0   26.079999   23.456000    36.800001
35          8.0        128.0   8.0   26.079999   23.776000    35.999998
36          8.0        256.0   1.0   30.112000   27.551999    52.448001
37          8.0        256.0   2.0   30.144000   27.616000    52.512001
38          8.0        256.0   4.0   30.208001   27.680000    52.671999
39          8.0        256.0   8.0   30.208001   27.840000    52.848000
40         16.0          8.0   1.0   20.144001   18.015999    33.583999
41         16.0          8.0   2.0   18.784000   17.856000    33.824001
42         16.0          8.0   4.0   20.447999   18.464001    42.943999
43         16.0          8.0   8.0   21.376001   19.328000    31.168001
44         16.0         32.0   1.0   22.080000   20.640001    33.264000
45         16.0         32.0   2.0   22.399999   21.120001    50.175998
46         16.0         32.0   4.0   22.752000   22.336001    39.168000
47         16.0         32.0   8.0   23.647999   23.552001    45.616001
48         16.0         64.0   1.0   26.400000   24.512000    38.208000
49         16.0         64.0   2.0   26.848000   25.024001    39.808001
50         16.0         64.0   4.0   26.848000   25.823999    40.447999
51         16.0         64.0   8.0   26.880000   27.264001    40.128000
52         16.0        128.0   1.0   26.144000   23.488000    40.688001
53         16.0        128.0   2.0   26.048001   23.488000   100.096002
54         16.0        128.0   4.0   26.079999   23.680000    36.704000
55         16.0        128.0   8.0   26.176000   23.968000    36.736000
56         16.0        256.0   1.0   30.176001   27.616000    52.639998
57         16.0        256.0   2.0   30.144000   27.680000    52.671999
58         16.0        256.0   4.0   30.239999   27.744001    52.832000
59         16.0        256.0   8.0   30.272000   28.255999    52.880000
60         32.0          8.0   1.0   20.000000   17.983999    33.248000
61         32.0          8.0   2.0   20.479999   18.464001    36.336001
62         32.0          8.0   4.0   21.376001   19.360000    40.959999
63         32.0          8.0   8.0   23.232000   21.152001    39.872002
64         32.0         32.0   1.0   22.080000   20.927999    42.176001
65         32.0         32.0   2.0   22.655999   22.368001    41.023999
66         32.0         32.0   4.0   23.391999   23.552001    38.079999
67         32.0         32.0   8.0   25.184000   25.440000    40.991999
68         32.0         64.0   1.0   26.815999   24.992000    39.680000
69         32.0         64.0   2.0   26.848000   25.632000    43.712001
70         32.0         64.0   4.0   27.136000   26.944000    37.023999
71         32.0         64.0   8.0   27.840000   28.608000    41.919999
72         32.0        128.0   1.0   26.079999   23.552001    39.999999
73         32.0        128.0   2.0   26.112000   23.680000    41.919999
74         32.0        128.0   4.0   26.176000   24.000000    38.624000
75         32.0        128.0   8.0   26.432000   24.576001    40.304001
76         32.0        256.0   1.0   30.144000   27.616000    52.671999
77         32.0        256.0   2.0   30.208001   27.872000    52.800000
78         32.0        256.0   4.0   30.304000   28.287999    52.992001
79         32.0        256.0   8.0   30.432001   29.056000    51.328000
80         64.0          8.0   1.0   19.168001   18.464001    40.431999
81         64.0          8.0   2.0   21.344000   19.328000    29.983999
82         64.0          8.0   4.0   22.112001   21.183999    36.063999
83         64.0          8.0   8.0   25.599999   24.288001    43.600000
84         64.0         32.0   1.0   22.528000   22.560000    39.903998
85         64.0         32.0   2.0   23.424000   23.520000    39.391998
86         64.0         32.0   4.0   25.632000   25.408000    41.632000
87         64.0         32.0   8.0   29.216001   29.408000    37.824001
88         64.0         64.0   1.0   26.848000   25.664000    44.351999
89         64.0         64.0   2.0   26.944000   27.008001    39.519999
90         64.0         64.0   4.0   27.872000   28.511999    44.799998
91         64.0         64.0   8.0   30.176001   31.488001    41.023999
92         64.0        128.0   1.0   26.079999   23.712000    43.216001
93         64.0        128.0   2.0   26.240001   24.032000    42.495999
94         64.0        128.0   4.0   26.335999   24.607999    41.760001
95         64.0        128.0   8.0   26.176000   24.448000    40.911999
96         64.0        256.0   1.0   30.208001   27.840000    52.639998
97         64.0        256.0   2.0   30.272000   28.255999    52.832000
98         64.0        256.0   4.0   30.463999   28.991999    51.376000
99         64.0        256.0   8.0   30.336000   29.376000    52.255999
100       128.0          8.0   1.0   20.096000   19.360000    40.479999
101       128.0          8.0   2.0   22.096001   21.183999    40.576000
102       128.0          8.0   4.0   25.632000   24.288001    41.232001
103       128.0          8.0   8.0   22.143999   19.616000    52.960001
104       128.0         32.0   1.0   23.424000   23.488000    38.304001
105       128.0         32.0   2.0   25.567999   25.408000    39.104000
106       128.0         32.0   4.0   29.216001   29.376000    41.568000
107       128.0         32.0   8.0   22.976000   20.671999    38.816001
108       128.0         64.0   1.0   27.071999   27.232001    41.248001
109       128.0         64.0   2.0   28.063999   28.576000    37.360001
110       128.0         64.0   4.0   30.080000   31.231999    41.728001
111       128.0         64.0   8.0   24.160000   22.272000    45.536000
112       128.0        128.0   1.0   26.208000   24.032000    40.240001
113       128.0        128.0   2.0   26.368000   24.416000    37.471998
114       128.0        128.0   4.0   26.176000   24.351999    46.064001
115       128.0        128.0   8.0   26.144000   24.416000    40.640000
116       128.0        256.0   1.0   30.304000   28.255999    52.896000
117       128.0        256.0   2.0   30.432001   29.088000    51.456001
118       128.0        256.0   4.0   30.368000   29.536000    52.512001
119       128.0        256.0   8.0   30.304000   29.727999    54.880001
120       256.0          8.0   1.0   23.264000   21.152001    39.136000
121       256.0          8.0   2.0   25.599999   24.256000    43.200001
122       256.0          8.0   4.0   22.112001   19.808000    52.735999
123       256.0          8.0   8.0   24.064001   20.992000    80.256000
124       256.0         32.0   1.0   25.599999   25.408000    41.056000
125       256.0         32.0   2.0   29.247999   29.376000    39.584000
126       256.0         32.0   4.0   22.976000   20.640001    40.544000
127       256.0         32.0   8.0   23.615999   21.376001    43.391999
128       256.0         64.0   1.0   28.063999   28.511999    37.551999
129       256.0         64.0   2.0   30.192001   31.424001    44.383999
130       256.0         64.0   4.0   24.192000   22.175999    44.704001
131       256.0         64.0   8.0   24.831999   22.816001    44.080000
132       256.0        128.0   1.0   26.400000   24.639999    38.400002
133       256.0        128.0   2.0   26.144000   24.416000    39.296001
134       256.0        128.0   4.0   26.176000   24.383999    43.008000
135       256.0        128.0   8.0   27.136000   25.376000    43.935999
136       256.0        256.0   1.0   30.400001   29.088000    51.520001
137       256.0        256.0   2.0   30.400001   29.408000    52.512001
138       256.0        256.0   4.0   30.304000   29.727999    54.880001
139       256.0        256.0   8.0   31.392001   30.912001    56.352001
140       512.0          8.0   1.0   25.599999   24.256000    40.256001
141       512.0          8.0   2.0   22.143999   19.808000    52.960001
142       512.0          8.0   4.0   24.000000   20.927999    80.192000
143       512.0          8.0   8.0   24.224000   21.792000   134.240001
144       512.0         32.0   1.0   29.247999   29.376000    77.264000
145       512.0         32.0   2.0   22.944000   20.703999    39.584000
146       512.0         32.0   4.0   23.615999   21.312000    43.520000
147       512.0         32.0   8.0   24.416000   22.336001    57.760000
148       512.0         64.0   1.0   30.080000   31.424001    36.800001
149       512.0         64.0   2.0   24.224000   22.240000    40.736001
150       512.0         64.0   4.0   24.863999   22.784000    40.511999
151       512.0         64.0   8.0   25.632000   24.448000    47.743998
152       512.0        128.0   1.0   26.144000   24.448000    38.975999
153       512.0        128.0   2.0   26.112000   24.448000    40.544000
154       512.0        128.0   4.0   27.136000   25.504000    44.064000
155       512.0        128.0   8.0   27.744001   26.815999    49.024001
156       512.0        256.0   1.0   30.368000   29.376000   453.280002
157       512.0        256.0   2.0   30.272000   29.727999    54.752000
158       512.0        256.0   4.0   31.360000   30.912001    56.288000
159       512.0        256.0   8.0   32.063998   31.968001    61.600000
160      1024.0          8.0   1.0   22.080000   19.648001   266.975999
161      1024.0          8.0   2.0  100.447997   20.896001    80.352001
162      1024.0          8.0   4.0   24.256000   22.240000   134.207994
163      1024.0          8.0   8.0   25.728000   23.903999   239.168003
164      1024.0         32.0   1.0   22.944000   20.671999    44.831999
165      1024.0         32.0   2.0   23.615999   21.407999    43.264002
166      1024.0         32.0   4.0   24.448000   22.911999    57.663999
167      1024.0         32.0   8.0   26.880000   25.200000    84.863998
168      1024.0         64.0   1.0   24.224000   22.175999    37.471998
169      1024.0         64.0   2.0   24.831999   22.848001    40.544000
170      1024.0         64.0   4.0   25.567999   24.416000    48.287999
171      1024.0         64.0   8.0   28.448001   26.848000    61.760001
172      1024.0        128.0   1.0   26.176000   24.351999    41.824002
173      1024.0        128.0   2.0   27.136000   25.472000    44.447999
174      1024.0        128.0   4.0   27.744001   26.303999    48.287999
175      1024.0        128.0   8.0   29.727999   28.384000    55.360001
176      1024.0        256.0   1.0   30.272000   29.792000    54.880001
177      1024.0        256.0   2.0   31.360000   30.975999    56.240000
178      1024.0        256.0   4.0   32.063998   31.936001    61.087999
179      1024.0        256.0   8.0   34.336001   34.784000    66.656001
180      2048.0          8.0   1.0   24.064001   20.992000    80.192000
181      2048.0          8.0   2.0   24.224000   21.792000   133.599997
182      2048.0          8.0   4.0   25.632000   24.544001   239.904001
183      2048.0          8.0   8.0   30.816000   29.920001   443.807989
184      2048.0         32.0   1.0   23.615999   21.376001    43.520000
185      2048.0         32.0   2.0   24.416000   22.816001    57.696000
186      2048.0         32.0   4.0   26.848000   25.248000    84.895998
187      2048.0         32.0   8.0   32.224000   31.679999   138.016000
188      2048.0         64.0   1.0   24.800001   22.752000    43.327998
189      2048.0         64.0   2.0   25.632000   24.432000    47.936000
190      2048.0         64.0   4.0   28.448001   26.944000    61.728001
191      2048.0         64.0   8.0   34.432001   34.015998    88.639997
192      2048.0        128.0   1.0   27.104000   25.408000    44.415999
193      2048.0        128.0   2.0   27.744001   26.815999    47.711998
194      2048.0        128.0   4.0   29.664000   28.960001    55.264000
195      2048.0        128.0   8.0   34.623999   33.888001    69.904003
196      2048.0        256.0   1.0   31.424001   30.912001    56.256000
197      2048.0        256.0   2.0   32.063998   32.448001    61.632000
198      2048.0        256.0   4.0   34.240000   34.816001    67.616001
199      2048.0        256.0   8.0   39.616000   39.519999    77.119999
200      4096.0          8.0   1.0   24.224000   22.272000   134.143993
201      4096.0          8.0   2.0   25.696000   24.480000   239.232004
202      4096.0          8.0   4.0   30.816000   29.952001   443.839997
203      4096.0          8.0   8.0   41.023999   41.248001   859.647989
204      4096.0         32.0   1.0   24.448000   22.848001    57.728000
205      4096.0         32.0   2.0   26.944000   25.248000    84.895998
206      4096.0         32.0   4.0   32.288000   31.072000   137.968004
207      4096.0         32.0   8.0   42.656001   43.327998   246.143997
208      4096.0         64.0   1.0   25.632000   24.448000    47.584001
209      4096.0         64.0   2.0   28.448001   27.488001    61.760001
210      4096.0         64.0   4.0   34.432001   33.952001    88.544004
211      4096.0         64.0   8.0   45.504000   46.432000   142.655998
212      4096.0        128.0   1.0   27.712001   26.784001    47.743998
213      4096.0        128.0   2.0   29.696001   28.368000    55.167999
214      4096.0        128.0   4.0   34.688000   33.952001    69.888003
215      4096.0        128.0   8.0   44.576000   45.279998    96.128002
216      4096.0        256.0   1.0   32.095999   31.936001    61.888002
217      4096.0        256.0   2.0   34.304000   34.175999    66.880003
218      4096.0        256.0   4.0   39.551999   40.192001    76.895997
219      4096.0        256.0   8.0   46.208002   47.936000    92.160001
220      8192.0          8.0   1.0   25.728000   24.448000   239.040002
221      8192.0          8.0   2.0   30.784000   29.312000   443.455994
222      8192.0          8.0   4.0   41.184001   41.375998   859.887987
223      8192.0          8.0   8.0   60.640000   62.752001  1693.856001
224      8192.0         32.0   1.0   26.848000   25.216000    84.831998
225      8192.0         32.0   2.0   32.256000   31.647999   138.592005
226      8192.0         32.0   4.0   42.528000   42.608000   246.079996
227      8192.0         32.0   8.0   63.391998   66.656001   454.255998
228      8192.0         64.0   1.0   28.480001   26.815999    61.760001
229      8192.0         64.0   2.0   34.368001   33.984002    88.032000
230      8192.0         64.0   4.0   45.472000   46.464000   141.984001
231      8192.0         64.0   8.0   68.351999   71.967997   249.599993
232      8192.0        128.0   1.0   29.696001   28.384000    55.248000
233      8192.0        128.0   2.0   34.655999   34.527998    69.856003
234      8192.0        128.0   4.0   44.512000   45.855999    96.767999
235      8192.0        128.0   8.0   63.776001   67.648001   150.544003
236      8192.0        256.0   1.0   34.304000   34.208000    67.648001
237      8192.0        256.0   2.0   39.551999   40.160000    76.991998
238      8192.0        256.0   4.0   46.271998   47.520000    91.232002
239      8192.0        256.0   8.0   64.960003   68.944003   120.320000

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ispobock, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a performance optimization to the moe_align_block_size kernel by integrating the padding of the sorted_token_ids buffer directly into the kernel's execution. This change is exposed as an optional parameter, allowing for conditional activation based on workload characteristics. The modifications span the CUDA kernel, C++ bindings, Python API, and are supported by updated benchmarks and comprehensive tests.

Highlights

  • Performance Optimization: I've fused the initialization (padding) of the sorted_token_ids buffer directly into the moe_align_block_size CUDA kernel. This eliminates a separate memory fill operation, potentially improving performance for certain workload shapes, as demonstrated by the included benchmarks.
  • API Extension: The moe_align_block_size function now accepts an optional boolean argument, pad_sorted_token_ids, which controls whether this fused padding behavior is enabled. This provides flexibility for users to choose the optimal execution path.
  • Benchmarking and Testing Enhancements: The benchmarking script has been updated to include a new 'SGL Fusion' category, showcasing the performance benefits of this change. The test suite has also been expanded to thoroughly validate the correctness of the fused operation across various configurations, including the new padding behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The changes introduce an optimization to fuse the padding of sorted_token_ids into the moe_align_block_size kernel. The benchmark results demonstrate performance improvements. I suggested a refactoring to reduce code duplication in the benchmark and improve test robustness by initializing expert_ids_cuda and expert_ids_triton with a sentinel value.

@ispobock ispobock changed the title Fuse pad sorted_token_ids to moe_align_block_size kernel Fuse sorted_token_ids padding to moe_align_block_size kernel Jun 22, 2025
if (pad_sorted_token_ids) {
int32_t fill_val = static_cast<int32_t>(numel);
int32_t total = *total_tokens_post_pad;
for (int32_t idx = tid; idx < total; idx += stride) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider use vectorized write here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here can use vec write. updated

@ispobock
Copy link
Collaborator Author

ispobock commented Jun 23, 2025

Benchmark after vec write
     num_tokens  num_experts  topk        SGL  SGL Fusion       Triton
0           1.0          8.0   1.0  18.208001   16.160000    28.144000
1           1.0          8.0   2.0  18.304000   16.224001    32.896001
2           1.0          8.0   4.0  18.271999   16.287999    28.448001
3           1.0          8.0   8.0  17.920000   16.319999    32.512002
4           1.0         32.0   1.0  20.800000   18.848000    31.583998
5           1.0         32.0   2.0  20.864001   18.848000    32.607999
6           1.0         32.0   4.0  20.896001   18.848000    38.096000
7           1.0         32.0   8.0  20.800000   19.007999    31.136001
8           1.0         64.0   1.0  25.184000   23.232000    39.264001
9           1.0         64.0   2.0  25.184000   23.232000    36.352001
10          1.0         64.0   4.0  25.184000   23.296000    34.240000
11          1.0         64.0   8.0  25.216000   23.344001    31.296000
12          1.0        128.0   1.0  24.288001   22.272000    35.071999
13          1.0        128.0   2.0  24.224000   22.304000    34.527998
14          1.0        128.0   4.0  24.288001   22.272000    34.559999
15          1.0        128.0   8.0  24.224000   22.320000    34.816001
16          1.0        256.0   1.0  29.088000   26.720000    51.040001
17          1.0        256.0   2.0  29.088000   26.720000    51.167998
18          1.0        256.0   4.0  29.088000   26.784001    51.199999
19          1.0        256.0   8.0  29.088000   26.784001    51.263999
20          8.0          8.0   1.0  18.015999   16.256001    37.471998
21          8.0          8.0   2.0  18.015999   16.319999    38.479999
22          8.0          8.0   4.0  17.920000   16.256001    46.176001
23          8.0          8.0   8.0  18.208001   16.736001    40.672000
24          8.0         32.0   1.0  20.768000   19.007999    39.712001
25          8.0         32.0   2.0  20.736000   19.072000    39.744001
26          8.0         32.0   4.0  20.544000   19.296000    30.176001
27          8.0         32.0   8.0  21.040000   19.967999    32.000002
28          8.0         64.0   1.0  25.184000   23.328001    31.199999
29          8.0         64.0   2.0  25.152000   23.456000    37.120000
30          8.0         64.0   4.0  25.119999   23.552001    37.856001
31          8.0         64.0   8.0  25.168000   23.744000    38.943999
32          8.0        128.0   1.0  24.192000   22.336001    34.559999
33          8.0        128.0   2.0  24.224000   22.272000    39.568000
34          8.0        128.0   4.0  24.192000   22.272000    39.728001
35          8.0        128.0   8.0  24.256000   22.528000    41.792002
36          8.0        256.0   1.0  29.088000   26.784001    50.944000
37          8.0        256.0   2.0  29.120000   26.815999    51.328000
38          8.0        256.0   4.0  29.120000   26.848000    51.456001
39          8.0        256.0   8.0  29.152000   27.039999    51.520001
40         16.0          8.0   1.0  17.888000   16.287999    32.575998
41         16.0          8.0   2.0  17.952001   16.256001    33.888001
42         16.0          8.0   4.0  18.368000   16.704001    34.143999
43         16.0          8.0   8.0  19.264000   17.600000    32.336000
44         16.0         32.0   1.0  20.736000   19.168001    40.192001
45         16.0         32.0   2.0  20.640001   19.264000    41.664001
46         16.0         32.0   4.0  21.056000   19.872000    39.200000
47         16.0         32.0   8.0  21.904000   20.896001    38.463999
48         16.0         64.0   1.0  25.152000   23.424000    39.903998
49         16.0         64.0   2.0  25.152000   23.552001    38.559999
50         16.0         64.0   4.0  25.087999   23.744000    48.319999
51         16.0         64.0   8.0  25.696000   24.480000    46.144001
52         16.0        128.0   1.0  24.288001   22.368001    47.648001
53         16.0        128.0   2.0  24.160000   22.304000    46.176001
54         16.0        128.0   4.0  24.224000   22.528000    38.752001
55         16.0        128.0   8.0  24.351999   22.784000    41.503999
56         16.0        256.0   1.0  29.120000   26.815999    51.295999
57         16.0        256.0   2.0  29.088000   26.848000    51.424000
58         16.0        256.0   4.0  29.152000   27.071999    51.584002
59         16.0        256.0   8.0  29.216001   27.551999    51.328000
60         32.0          8.0   1.0  17.888000   16.256001    40.399998
61         32.0          8.0   2.0  18.336000   16.704001    32.703999
62         32.0          8.0   4.0  19.264000   17.632000    37.408002
63         32.0          8.0   8.0  20.959999   19.487999    33.856001
64         32.0         32.0   1.0  20.576000   19.168001    44.672001
65         32.0         32.0   2.0  21.056000   19.840000    39.744001
66         32.0         32.0   4.0  21.919999   20.864001   132.192001
67         32.0         32.0   8.0  23.680000   22.720000    46.432000
68         32.0         64.0   1.0  25.184000   23.584001    46.624001
69         32.0         64.0   2.0  25.087999   23.776000    45.984000
70         32.0         64.0   4.0  25.599999   24.432000    33.248000
71         32.0         64.0   8.0  26.591999   25.728000    31.776000
72         32.0        128.0   1.0  24.192000   22.272000    35.456002
73         32.0        128.0   2.0  24.256000   22.528000    35.680000
74         32.0        128.0   4.0  24.192000   22.848001    36.304001
75         32.0        128.0   8.0  24.416000   23.360001    38.880002
76         32.0        256.0   1.0  29.120000   26.848000    51.424000
77         32.0        256.0   2.0  29.120000   27.039999    51.552001
78         32.0        256.0   4.0  29.247999   27.648000    51.663999
79         32.0        256.0   8.0  29.408000   28.416000    50.175998
80         64.0          8.0   1.0  18.368000   16.704001    31.936001
81         64.0          8.0   2.0  19.168001   17.568000    29.088000
82         64.0          8.0   4.0  21.152001   19.487999    32.224000
83         64.0          8.0   8.0  24.928000   23.167999    38.911998
84         64.0         32.0   1.0  20.992000   19.904001    34.400001
85         64.0         32.0   2.0  21.904000   20.896001    42.752001
86         64.0         32.0   4.0  23.712000   22.688000    39.808001
87         64.0         32.0   8.0  27.616000   26.528001    41.600000
88         64.0         64.0   1.0  25.152000   23.871999    42.911999
89         64.0         64.0   2.0  25.664000   24.464000    41.696001
90         64.0         64.0   4.0  26.528001   25.760001    36.047999
91         64.0         64.0   8.0  28.543999   28.287999    38.688000
92         64.0        128.0   1.0  24.256000   22.528000    35.999998
93         64.0        128.0   2.0  24.224000   22.879999    43.072000
94         64.0        128.0   4.0  24.416000   23.296000    36.928002
95         64.0        128.0   8.0  24.256000   23.328001    37.567999
96         64.0        256.0   1.0  29.152000   27.039999    51.360000
97         64.0        256.0   2.0  29.247999   27.584000    51.647998
98         64.0        256.0   4.0  29.376000   28.416000    50.144002
99         64.0        256.0   8.0  29.279999   28.767999    51.328000
100       128.0          8.0   1.0  19.231999   17.632000    40.384000
101       128.0          8.0   2.0  20.992000   19.487999    34.432001
102       128.0          8.0   4.0  24.831999   23.200000    38.240001
103       128.0          8.0   8.0  20.416001   18.495999    51.167998
104       128.0         32.0   1.0  21.888001   20.864001    42.304002
105       128.0         32.0   2.0  23.680000   22.688000    40.383998
106       128.0         32.0   4.0  27.584000   26.591999    42.463999
107       128.0         32.0   8.0  21.280000   19.520000    41.216001
108       128.0         64.0   1.0  25.536001   24.448000    38.288001
109       128.0         64.0   2.0  26.559999   25.760001    38.208000
110       128.0         64.0   4.0  28.543999   28.255999    33.296000
111       128.0         64.0   8.0  23.456000   21.792000    37.184000
112       128.0        128.0   1.0  24.224000   22.879999    40.256001
113       128.0        128.0   2.0  24.351999   23.296000    36.704000
114       128.0        128.0   4.0  24.256000   23.232000    37.951998
115       128.0        128.0   8.0  24.256000   23.391999    41.696001
116       128.0        256.0   1.0  29.247999   27.551999    51.552001
117       128.0        256.0   2.0  29.440001   28.384000    49.727999
118       128.0        256.0   4.0  29.279999   28.767999    51.199999
119       128.0        256.0   8.0  29.216001   29.023999    53.408001
120       256.0          8.0   1.0  21.152001   19.487999    42.752001
121       256.0          8.0   2.0  24.928000   23.200000    38.592000
122       256.0          8.0   4.0  20.384001   18.592000    51.072001
123       256.0          8.0   8.0  21.760000   19.743999    78.496002
124       256.0         32.0   1.0  23.680000   22.720000    40.224001
125       256.0         32.0   2.0  27.712001   26.559999    36.800001
126       256.0         32.0   4.0  21.280000   19.455999    43.680001
127       256.0         32.0   8.0  21.984000   20.128001    41.696001
128       256.0         64.0   1.0  26.559999   25.823999    42.240001
129       256.0         64.0   2.0  28.543999   28.160000    35.487998
130       256.0         64.0   4.0  23.456000   21.728000    38.368002
131       256.0         64.0   8.0  24.000000   22.368001    39.872002
132       256.0        128.0   1.0  24.383999   23.391999    39.503999
133       256.0        128.0   2.0  24.288001   23.264000    38.079999
134       256.0        128.0   4.0  24.256000   23.407999    44.576000
135       256.0        128.0   8.0  25.280001   24.383999    42.784002
136       256.0        256.0   1.0  29.440001   28.224001    50.175998
137       256.0        256.0   2.0  29.279999   28.864000    51.199999
138       256.0        256.0   4.0  29.216001   29.152000    53.536002
139       256.0        256.0   8.0  30.400001   30.336000    54.912001
140       512.0          8.0   1.0  24.896000   23.184000    38.495999
141       512.0          8.0   2.0  20.416001   18.495999    51.135998
142       512.0          8.0   4.0  21.792000   19.680001    78.496002
143       512.0          8.0   8.0  22.688000   20.864001   131.568000
144       512.0         32.0   1.0  27.584000   26.591999    34.400001
145       512.0         32.0   2.0  21.248000   19.455999    35.712000
146       512.0         32.0   4.0  21.952000   20.256000    52.271999
147       512.0         32.0   8.0  23.296000   21.536000    55.167999
148       512.0         64.0   1.0  28.511999   28.255999    34.015998
149       512.0         64.0   2.0  23.520000   21.792000    36.768001
150       512.0         64.0   4.0  24.000000   22.464000    40.192001
151       512.0         64.0   8.0  25.440000   23.840001    47.871999
152       512.0        128.0   1.0  24.288001   23.391999    37.951998
153       512.0        128.0   2.0  24.256000   23.391999    40.640000
154       512.0        128.0   4.0  25.248000   24.288001    42.463999
155       512.0        128.0   8.0  27.232001   25.823999    46.912000
156       512.0        256.0   1.0  29.279999   28.832000    51.199999
157       512.0        256.0   2.0  29.216001   29.088000    97.759999
158       512.0        256.0   4.0  30.288000   30.336000    91.008000
159       512.0        256.0   8.0  31.583998   31.552002   140.464000
160      1024.0          8.0   1.0  20.416001   18.495999    95.551997
161      1024.0          8.0   2.0  21.728000   19.808000   102.335997
162      1024.0          8.0   4.0  22.784000   20.800000   131.487995
163      1024.0          8.0   8.0  25.728000   24.127999   237.535998
164      1024.0         32.0   1.0  21.280000   19.455999    45.440000
165      1024.0         32.0   2.0  22.016000   20.064000    42.144001
166      1024.0         32.0   4.0  23.296000   21.504000    55.215999
167      1024.0         32.0   8.0  26.912000   25.343999    83.296001
168      1024.0         64.0   1.0  23.424000   21.760000    37.632000
169      1024.0         64.0   2.0  24.000000   22.431999    40.800001
170      1024.0         64.0   4.0  25.504000   23.808001    47.936000
171      1024.0         64.0   8.0  29.216001   26.784001    60.031999
172      1024.0        128.0   1.0  24.256000   23.391999    39.487999
173      1024.0        128.0   2.0  25.248000   24.320001    42.847998
174      1024.0        128.0   4.0  27.200000   25.823999    47.040001
175      1024.0        128.0   8.0  29.568000   28.192000    54.111999
176      1024.0        256.0   1.0  29.216001   29.120000    53.408001
177      1024.0        256.0   2.0  30.400001   30.336000    54.976001
178      1024.0        256.0   4.0  31.615999   31.488001    59.519999
179      1024.0        256.0   8.0  34.240000   34.047998    94.655998
180      2048.0          8.0   1.0  21.663999   19.648001    78.560002
181      2048.0          8.0   2.0  22.752000   20.896001   131.615996
182      2048.0          8.0   4.0  25.728000   24.095999   237.760007
183      2048.0          8.0   8.0  30.944001   29.312000   442.624003
184      2048.0         32.0   1.0  22.016000   20.256000    43.680001
185      2048.0         32.0   2.0  23.296000   21.663999    55.264000
186      2048.0         32.0   4.0  26.880000   25.583999    83.296001
187      2048.0         32.0   8.0  32.256000   30.784000   136.864007
188      2048.0         64.0   1.0  24.000000   22.464000    40.128000
189      2048.0         64.0   2.0  25.472000   23.840001    47.871999
190      2048.0         64.0   4.0  29.232000   26.752001    60.031999
191      2048.0         64.0   8.0  34.304000   33.183999    86.751997
192      2048.0        128.0   1.0  25.280001   24.351999    42.911999
193      2048.0        128.0   2.0  27.232001   25.855999    47.008000
194      2048.0        128.0   4.0  29.536000   28.192000    54.175999
195      2048.0        128.0   8.0  34.495998   33.728000    68.000004
196      2048.0        256.0   1.0  30.368000   30.239999    54.912001
197      2048.0        256.0   2.0  31.552002   31.520002    59.680000
198      2048.0        256.0   4.0  34.224000   34.143999    65.888003
199      2048.0        256.0   8.0  39.584000   39.487999    75.935997
200      4096.0          8.0   1.0  22.752000   20.864001   131.551996
201      4096.0          8.0   2.0  25.776001   24.095999   237.471998
202      4096.0          8.0   4.0  30.912001   29.247999   442.591995
203      4096.0          8.0   8.0  40.320002   40.640000   858.399987
204      4096.0         32.0   1.0  23.264000   21.504000    55.167999
205      4096.0         32.0   2.0  26.912000   25.599999    83.296001
206      4096.0         32.0   4.0  32.256000   30.880000   136.864007
207      4096.0         32.0   8.0  41.824002   41.760001   244.080000
208      4096.0         64.0   1.0  25.440000   23.871999    47.040001
209      4096.0         64.0   2.0  29.184001   26.720000    59.999999
210      4096.0         64.0   4.0  34.336001   33.280000    86.719997
211      4096.0         64.0   8.0  44.608001   44.992000   141.343996
212      4096.0        128.0   1.0  27.200000   25.888000    47.807999
213      4096.0        128.0   2.0  29.600000   28.255999    53.280000
214      4096.0        128.0   4.0  34.495998   33.760000    68.095997
215      4096.0        128.0   8.0  43.680001   45.024000    94.591998
216      4096.0        256.0   1.0  31.615999   31.456001    59.776001
217      4096.0        256.0   2.0  34.208000   34.047998    66.079997
218      4096.0        256.0   4.0  39.664000   39.519999    75.967997
219      4096.0        256.0   8.0  45.343999   47.456000    89.968000
220      8192.0          8.0   1.0  25.728000   24.032000   237.535998
221      8192.0          8.0   2.0  30.912001   29.376000   442.719996
222      8192.0          8.0   4.0  40.352002   40.064000   859.103978
223      8192.0          8.0   8.0  59.872001   61.632000  1693.552017
224      8192.0         32.0   1.0  26.912000   25.567999    83.264001
225      8192.0         32.0   2.0  32.320000   30.816000   136.831999
226      8192.0         32.0   4.0  41.712001   42.399999   244.112000
227      8192.0         32.0   8.0  62.592000   65.279998   453.664005
228      8192.0         64.0   1.0  29.184001   26.688000    60.031999
229      8192.0         64.0   2.0  34.368001   33.055998    86.687997
230      8192.0         64.0   4.0  44.608001   44.831999   140.640005
231      8192.0         64.0   8.0  67.616001   71.231999   248.159997
232      8192.0        128.0   1.0  29.568000   28.224001    54.143999
233      8192.0        128.0   2.0  34.559999   33.696000    68.095997
234      8192.0        128.0   4.0  43.680001   44.512000    95.519997
235      8192.0        128.0   8.0  62.784001   66.255998   149.407998
236      8192.0        256.0   1.0  34.240000   34.047998    65.712001
237      8192.0        256.0   2.0  39.551999   39.519999    75.871997
238      8192.0        256.0   4.0  45.375999   47.552001    90.112001
239      8192.0        256.0   8.0  64.287998   68.960004   118.975997

Copy link
Collaborator

@BBuf BBuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhyncs zhyncs merged commit 57ab776 into sgl-project:main Jun 25, 2025
57 of 65 checks passed
chenxijun1029 pushed a commit to chenxijun1029/sglang that referenced this pull request Jul 17, 2025
pi314ever pushed a commit to pi314ever/sglang that referenced this pull request Jul 17, 2025
* Use seq_len_fill_value in the cuda graph runners (sgl-project#7233)

* support custom weight loader for model runner (sgl-project#7122)

Co-authored-by: kavioyu <kavioyu@tencent.com>

* Fix AMD speculative decoding (sgl-project#7252)

* [Refactor] OAI Server components (sgl-project#7167)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

* OAI Server Skeleton & Core Utility Endpoints (sgl-project#7179)

* [amd] Opt dsv3 moe (sgl-project#7160)

Co-authored-by: wunhuang <wunhuang@amd.com>

* update ci node for xeon (sgl-project#7265)

* feat: mtp support dp-attention (sgl-project#6081)

Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>

* support qwen2 running on ascend npu device (sgl-project#7022)

Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>

* Fix Deepseek R1 0528 FP4 tensor name mismatch issue during weights loading. (sgl-project#7164)

* bugfix(tool call ebnf): Fix EBNF generation for optional function parameters (sgl-project#7283)

* Fix AWQ Dequant and Weight Loading of deepseek v2 (sgl-project#6842)

* fix: resolve b200 dsv3 mtp issue (sgl-project#7286)

* ci: Fix test_ebnf_generate_all_optional_function_params (sgl-project#7288)

* fix: only enable flash_attn test on sm80 sm90 (sgl-project#7289)

* [PD] Support get local ip from NIC for PD disaggregation (sgl-project#7237)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* [PD] Add custom memory pool option to support Mooncake PD with NVLink  (sgl-project#7264)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* Upstreaming hicache bug fixes (sgl-project#7267)

* Update python API of activation, topk, norm and rope and remove vllm dependency (sgl-project#6614)

Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>

* Fix hicache benchmark script bug - some sampled input_request is [] (sgl-project#7300)

* chore: change logs from`INFO` to `DEBUG` for dp and add force quit for tokenizer manager (sgl-project#7251)

* update invalid link in doc (sgl-project#7297)

* Fix mini_lb for PD with long output: limit chunk size of decode response (sgl-project#7301)

Signed-off-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>
Co-authored-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>

* Fix profiler error when there are idle passes (sgl-project#7003)

* [pd] optimize dockerfile for  pd disaggregation (sgl-project#7319)

Co-authored-by: zhyncs <me@zhyncs.com>

* Merge PDLB (Prefill-Decode Load Balancer) into SGLang Router (sgl-project#7096)

* Add more refactored openai test & in CI (sgl-project#7284)

* fix: resolve blackwell deepep image issue (sgl-project#7331)

* add seed in CPU UTs to avoid flaky failure (sgl-project#7333)

* Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (sgl-project#7099)

* Reintroduce tiny fix sampler error when prob is not contiguous (sgl-project#7354)

* [Refactor] Clean up radix cache related API (sgl-project#7303)

Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>

* Put `_normalize_rid` before other normalization in `io_struct` (sgl-project#7363)

* [PD] Transfer hidden states for mtp when disaggregation (sgl-project#7242)

* [Bugfix][PD] Set conclude state before clear when failure happens (sgl-project#7362)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* docs: update installation (sgl-project#7366)

* [Docker] optimize dockerfile  remove deepep and blackwell merge it to… (sgl-project#7343)

Co-authored-by: Yineng Zhang <me@zhyncs.com>

* Clean unused import for mimo mtp model (sgl-project#7370)

* [Bugfix]Fix hang bug using dp attention with HiRadixCache (sgl-project#7159)

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

* [Doc] add embedding rerank doc (sgl-project#7364)

* Fix judgment condition for enabling Deepseek V3/R1 shared expert fusion optimization (sgl-project#7371)

* Feat/refactor embedding server (sgl-project#7322)

* Purge VerlEngine (sgl-project#7326)

Signed-off-by: Ata Fatahi <immrata@gmail.com>

* support return logprobs for pipeline (sgl-project#7356)

Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>

* [PD] Optimize custom mem pool usage and bump mooncake version (sgl-project#7393)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* Support THUDM/GLM-4-0414 (GLM-Z1) Glm4ForCausalLM architecture. (sgl-project#5485)

* Refine OpenAI serving entrypoint to remove batch requests (sgl-project#7372)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Chang Su <csu272@usc.edu>

* [Feature] Comprehensive Hybrid Parallelism Support (sgl-project#6389)

* [DeepSeekNextN] fix: residual of head norm can be None (sgl-project#7398)

* [OAI refactor] Add rerank and score serving (sgl-project#7399)

Co-authored-by: Chang Su <chang.s.su@oracle.com>

* [OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo Processor (sgl-project#7360)

Co-authored-by: Chang Su <chang.s.su@oracle.com>

* Fix All-Gather under world size one (sgl-project#7219)

* Optimize DP attn scheduling for speculative decoding (sgl-project#7285)

* Update usage_processor.py (sgl-project#7402)

* Fix 7285 Merge Conflicts (sgl-project#7403)

* chore: upgrade mooncake-transfer-engine 0.3.4 (sgl-project#7401)

* [OAI Server Refactor] [ChatCompletions & Completions] Support Return Hidden State (sgl-project#7329)

Signed-off-by: keru <rukeyang@gmail.com>

* Remove batches api in docs & example (sgl-project#7400)

* [BugFix]: fix EmbeddingReqInput single input error (sgl-project#7396)

* [BugFix]fix qwen25 invoke function call streaming responses with curly braces as the starting indicator (sgl-project#7394)

* fix overlap pagecount (sgl-project#6984)

Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>

* fix: Fix CI test_function_call_parser.py (sgl-project#7425)

* Fix CPU offloading for MLA memory pool (sgl-project#7409)

* [fix] PD disaggregation when enable mtp and tp!=dp (sgl-project#7420)

* feat(oai refactor): Replace `openai_api` with `entrypoints/openai`  (sgl-project#7351)

Co-authored-by: Jin Pan <jpan236@wisc.edu>

* Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support (sgl-project#7412)

* refactor(test): reorganize OpenAI test file structure (sgl-project#7408)

* [minor] simplify the `TokenToKVPoolAllocator` (sgl-project#7414)

* Tiny add logging for GC  (sgl-project#7406)

* FlashInfer NVFP4 MoE with EP & 2-stream shared expert (sgl-project#7327)

Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Co-authored-by: alcanderian <alcanderian@gmail.com>

* Remove copy after bmm (sgl-project#7441)

* Fix torch compile run (sgl-project#7391)

Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>

* [misc] Add PD service discovery support in router (sgl-project#7361)

* add fused moe config for qwen3 in triton3.3.1 (sgl-project#7445)

* Fix CUDA Graph Check under Deepep with DP FFN (sgl-project#7451)

* Update hyperparameter_tuning.md (sgl-project#7454)

* feat: integrate deepgemm into EPMoE (sgl-project#6821)

Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: TianQiLin666666 <1834987979@qq.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>

* Solve docker build failed in the virtual machine (sgl-project#7290)

Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>

* Fix a bug in BatchTokenIDOut & Misc style and dependency updates (sgl-project#7457)

* [CI] Upgrade mooncake to 0.3.4.post1 to fix 8 gpu tests (sgl-project#7472)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* Fix prefill OOM due to wrong token calculation when page > 1  (sgl-project#7397)

* feat(func_call): Add more check in `BaseFormatDetector.parse_streaming_increment` (sgl-project#7479)

* Fix dtype for idle input in spec decoding (sgl-project#7456)

* update mooncake in dockerfile (sgl-project#7480)

* kvcache io kernels and test case (sgl-project#7382)

* [perf] slightly imporve DeepSeek-R1-FP4 TP8 (sgl-project#7481)

* Quick fix for DeepGemm requant to also cover MTP. (sgl-project#7378)

* Support weight loading without mmap (sgl-project#7469)

* ci: Revert openai_server related tests in AMD suites (sgl-project#7449)

* Perormance: Enable cuda graph for dp idle batch (sgl-project#7269)

Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>

* bugfix: Prevent global mutation of conv.stop_str across requests (sgl-project#7347)

Co-authored-by: Chang Su <chang.s.su@oracle.com>

* Fix RequestValidationError response format (sgl-project#7487)

* Fix MTP with Deepseek R1 Fp4 (sgl-project#7376)

* chore: bump sgl-kernel v0.2.0 (sgl-project#7490)

* chore: bump v0.4.8 (sgl-project#7493)

* [AMD] add aiter fused moe in DeepEP path (sgl-project#7268)

* enable aiter_biased_grouped_topk kernel (sgl-project#7423)

* [PD Disaggregation] replace transfer with batch transfer for better performance (sgl-project#7236)

* Remove cumsum_buffer initilization (sgl-project#7439)

* [benchmark] fbgemm benchmark support bandwidth report and support fbgemm_cutlass_gmm (sgl-project#7422)

* Support multi-thread model weight loading (sgl-project#7277)

* [PD] NIXL: Register kv args in advance and cleanup finished requests (sgl-project#6717)

* fix: Add `--model` as an alias for `--model-path` in server_args (sgl-project#7505)

* misc: Improvement to serving_chat.py and add more ut (sgl-project#7489)

* Fuse sorted_token_ids padding to moe_align_block_size kernel (sgl-project#7437)

* [OAI] patch origin request_id logic (sgl-project#7508)

* [PD][Spec] Fix hidden state transfer for spec decode (sgl-project#7516)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* EPLB support for MTP (sgl-project#7510)

* clean duplicate code (sgl-project#7512)

* [ci] add router benchmark script and CI (sgl-project#7498)

* fix: force synchronization between TP workers when update_weights (sgl-project#6626)

Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com>

* [CPU] [BF16] Call fused_experts_cpu, weight_packed_linear and bmm_cpu kernel in DeepSeek model (sgl-project#6641)

Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [CI] Upgrade mooncake to v0.3.4.post2 to fix potential slice failed bug (sgl-project#7522)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* npu fused op (sgl-project#7386)

Co-authored-by: Li Junwen <lijunwen13@hisilicon.com>

* feat: send kvmetrics from sglang scheduler (sgl-project#6721)

* [PD] Add different TP sizes support for no-MLA models (sgl-project#6793)

Co-authored-by: shangmingc <csmthu@gmail.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>

* enable aiter fp8 blockscale quant (sgl-project#7520)

* take aiter get_rope back (sgl-project#7521)

* Fix typo of flash_cache (sgl-project#7513)

* feat: add return hidden_states at async generation (sgl-project#7507)

* minor: 'role' must be system/assistant/tool, but case insensitive for now (sgl-project#7499)

* Fix FP8 KV Cache Support in FA3 Backend (sgl-project#7148)

* Fix gathered_buffer issues in tbo (sgl-project#7531)

* [PD] Raise error for incompatible mooncake version and some minor fixes (sgl-project#7527)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* [CMake] Fix sgl-kernel CMakeLists for Blackwell (sgl-project#7543)

* Add Tencent HunYuanMoEV1 model support (sgl-project#7549)

* Update seed in CPU UTs to avoid flaky failure with single test (sgl-project#7544)

* chore: improve ci bug reporting (sgl-project#7542)

* chore: remove vlm unnecessary import (sgl-project#7541)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* chore: bump v0.4.8.post1 (sgl-project#7559)

* [PD][NIXL] Set is_sorted=False to fix NIXL_ERR_NOT_FOUND (sgl-project#7330)

* [Fix] incorrect assert in EPLB (sgl-project#7575)

* Updates Gemma3n MLP layer to adapt latest transformers version (sgl-project#7573)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

* Fix MTP error when enabling two-batch overlap  (sgl-project#7569)

* Add e2e test for multi instance multi stage memory release/resume occupuation (sgl-project#7208)

Signed-off-by: Ata Fatahi <immrata@gmail.com>

* [CI] Add CI Testing for Prefill-Decode Disaggregation with Router (sgl-project#7540)

* Updates transformers and timm dependencies (sgl-project#7577)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

* feat: support compatibility between MTP and two-batch-overlap (sgl-project#7225)

Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>

* Move multimodal processors into a separate folder (sgl-project#7581)

* Fix broken CI TestVILAServer (sgl-project#7610)

* [router] add centralized configuration module for sgl-router (sgl-project#7588)

* Fix: Minicpm (sgl-project#7612)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

* Hybrid kv cache for LLaMA4 (sgl-project#6563)

Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>

* [CPU] add optimizations for INT8 and FP8 DeepSeek (sgl-project#6769)

Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>

* Tiny add logs for expert location updater (sgl-project#7308)

* Fix flakiness in LoRA batch test. (sgl-project#7552)

* [BUG] fix local_rank in initialize_dp_attention (sgl-project#7584)

* Support dynamic LoRA loading / unloading in engine/server API (sgl-project#7446)

* [PD] Respect sampling_params.max_new_tokens when PD disaggregation is activated (sgl-project#7598)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* fix unit tests (sgl-project#7618)

* Let ep_scatter support arbitrary strides / ue8m0 format (sgl-project#7309)

* Let EP prefill support new DeepGEMM (sgl-project#7310)

* docs: add gb200 nvl72 and a16z grant (sgl-project#7620)

* oai: Adds support for OpenAI chat completions API in bench_serving (sgl-project#7036)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* [bugfix] Remove PR comment posting from Rust benchmark workflow (sgl-project#7625)

* [Minor] clean up multimodal processor and tokenizer manager (sgl-project#7624)

* Add dsv3 fused a gemm to sgl-kernel (sgl-project#7630)

* Add @mickqian as the CODEOWNERS of multimodal (sgl-project#7636)

* Fix stream reasoning parser and Adds Kimi reasoning parser  (sgl-project#7432)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

* Fix sgl-router startup crash (sgl-project#7619)

* [bugfix] fix runtime dropping panic in editable (sgl-project#7628)

* Move files related to EPLB (sgl-project#7580)

* [misc] reduce weird rope_scaling_factor warning (sgl-project#7176)

* [AMD] Add unit-test-sgl-kernel-amd to AMD CI (sgl-project#7539)

* Update CODEOWNERS (sgl-project#7640)

* [EAGLE] remove a wrong adjustment for page_size > 1 & topk > 1 in server_args.py (sgl-project#7643)

* [CPU] add c++ kernel to bind CPU cores and memory node (sgl-project#7524)

* Improve streaming, log_level, memory report, weight loading, and benchmark script (sgl-project#7632)

Co-authored-by: Kan Wu <wukanustc@gmail.com>

* Add dsv3 router gemm kernel (sgl-project#7627)

* chore: upgrade flashinfer v0.2.7 jit (sgl-project#7663)

* [doc] update lws doc for pd (sgl-project#7318)

* Fix: sync prepare_fp8_layer_for_marlin with latest vllm changes (sgl-project#7648)

* Add small requirements for benchmark/parse_result tools (sgl-project#7671)

* [CPU] remove process_group from inputs of shm_allreduce and shm_allgather (sgl-project#7486)

* chore: bump sgl-kernel v0.2.1 (sgl-project#7675)

* support llama4 eagle3  (sgl-project#6985)

Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: Shenggui Li <somerlee.9@gmail.com>
Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>

* Refactor mm processors and Enable mixed modality processing (sgl-project#7629)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

* upgrade sgl kernel to 0.2.1 for main (sgl-project#7676)

* add description for llama4 eagle3 (sgl-project#7688)

* fix(model loader): use safe_open to prevent file handle leaks. (sgl-project#7684)

* chore: upgrade flashinfer v0.2.7.post1 (sgl-project#7698)

* Improve error handling for requests with unloaded LoRA path(s) (sgl-project#7642)

* Apply dsv3_fused_a_gemm kernel (sgl-project#7635)

* Fix GPTQMarlinMoE (sgl-project#7697)

* [1/n] apply wna16marlin kernel in moe weight only quantization (sgl-project#7683)

Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: 弋云 <yiyun.wyt@antgroup.com>
Co-authored-by: walker-ai <2398833647@qq.com>

* Apply dsv3 router gemm kernel for deepseek-r1 fp4 (sgl-project#7677)

* [AMD] Temporarily disable test_no_overlap_scheduler and test_vision_chunked_prefill (sgl-project#7717)

* [RL] add --skip-warmup (sgl-project#7416)

* [RL] support update_weights_from_distributed with different group and multiple weights (sgl-project#7292)

* [router] add --log-level to sgl-router (sgl-project#6512)

* [b200] support trt-llm allreduce fuse rms_norm_add kernel (sgl-project#7621)

* [CPU] Bind threads and numa node for each TP rank (sgl-project#6549)

Co-authored-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com>

* Support non-contiguous query input for extend/decode attention (sgl-project#7462)

* Support updating weights at once by stopping all requests (sgl-project#6698)

Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>
Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>

* Fix num_tokens_pre_allocated in disaggregation log (sgl-project#7714)

* [CPU] [sgl-kernel] set dispatch key of initialize to CatchAll (sgl-project#7734)

* [CPU] fix all_reduce and all_gather (sgl-project#6770)

Co-authored-by: blzheng <beilei.zheng@intel.com>

* fix awq and dsv3 fused gemm compatible (sgl-project#7735)

* [CI][Router] Fix bench_one_batch_server for pd router test (sgl-project#7731)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* Add CUTLASS FP8 Blockscale MoE kernel for Hopper architecture (sgl-project#7278)

Co-authored-by: HydraQYH <QYH820@Outlook.com>
Co-authored-by: TianQiLin666666 <1834987979@qq.com>

* fix dsv3 fused proj check  (sgl-project#7738)

* Ascend attention backend(PA&MLA) (sgl-project#7722)

Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: VDV1985 <vladdv85@mail.ru>

* [fix] fix dsv3_router_gemm filter (sgl-project#7750)

* [CPU] refine CPU integration code (sgl-project#7647)

* [CPU] support the case where num_attention_heads or intermediate_size is not divisible by the TP size (sgl-project#6771)

* support qwen3 dense model dp attention (sgl-project#7681)

* [optimize] add two stream norm for qwen3 (sgl-project#7740)

Co-authored-by: ispobock <ispobaoke@gmail.com>

* feat: use D2D instead of H2H in pp (sgl-project#7673)

Co-authored-by: alpha-baby <fujianhao1997@qq.com>

* [Bug] add flashinfer bool check for fusedmoe in Qwen moe models (sgl-project#7723)

* [fix] put cpu in the first priority in get_device() (sgl-project#7752)

* [optimize] fuse renormalize into moe_topk_softmax (sgl-project#7744)

Co-authored-by: ispobock <ispobaoke@gmail.com>

* chore: bump sgl-kernel 0.2.2 (sgl-project#7755)

* fix CI: update native api ipynb (sgl-project#7754)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

* fuse renormal into moe topk softmax kernel python code (sgl-project#7751)

Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>

* Remove type conversion and fix id map in topk (sgl-project#7759)

* Add V2-lite model test (sgl-project#7390)

Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>

* refactor llama4 dp attention logic (sgl-project#7729)

* fix(docs): fix the broken link in `docs/references/production_metrics.md` (sgl-project#7741)

Signed-off-by: rudeigerc <rudeigerc@gmail.com>

* [fix] update bench_speculative.py for compatibility (sgl-project#7764)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* Move mem_fraction_static adjustment for multimodal models to `server_args.py` & Fix session control & Other cleanups (sgl-project#7748)

* [RL] Add --nccl-port to prevent port conflict (sgl-project#7418)

* [RL] add pause and continue generation for async rl training (sgl-project#7419)

* [Fix] Alloc return type error (sgl-project#7778)

Signed-off-by: Capronir <839972205@qq.com>

* [feat] Support EAGLE3 for Qwen (sgl-project#7745)

Co-authored-by: 纬杭 <ximing.wxm@antgroup.com>
Co-authored-by: zyksir <zyksir@outlook.com>

* saving hidden_states.clone() (sgl-project#7705)

* [1/n]: add cutlass W4A8 moe kernel for hopper architecture (sgl-project#7772)

Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
Co-authored-by: yicwang <yichen.wang@bytedance.com>

* add model: qwen2-audio (sgl-project#7596)

* Optimize Hopper CUTLASS FP8 Blockwise Grouped GEMM Kernel in Small K Scenario (sgl-project#7782)

* Embedding parallel by attn_tp (sgl-project#7623)

* fix: fix apply_shuffle_mul_sum (sgl-project#7444)

* chore: bump sgl-kernel v0.2.3 (sgl-project#7784)

* fix: use nvidia-nccl-cu12 2.27.5 (sgl-project#7787)

* DP Attention with Auto DeepEP Dispatch (sgl-project#7222)

* chore: upgrade sgl-kernel v0.2.3 (sgl-project#7786)

* Fix incorrect spec_num_draft_tokens in draft_extend (sgl-project#7757)

* [fix] fix misusing of is_cuda (sgl-project#7790)

* Add treemask mode to build_eagle_tree & release sgl-kernel 0.2.3 (sgl-project#7756)

Co-authored-by: Pranjal Shankhdhar <pranjal.ssh@gmail.com>

* chore: bump sgl-kernel v0.2.4 (sgl-project#7800)

* ci: fix port args (sgl-project#7792)

* Fix CI test OOM issue. (sgl-project#7799)

* chore: upgrade sgl-kernel v0.2.4 (sgl-project#7801)

* chore: bump v0.4.9 (sgl-project#7802)

* fix merge conflict issue

* fix hpu attention nonetyep issue

* fix alignment

* fix alignment2

* Ci failure fixes

* fix attention-backend choices

---------

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Signed-off-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: Ata Fatahi <immrata@gmail.com>
Signed-off-by: keru <rukeyang@gmail.com>
Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Signed-off-by: Capronir <839972205@qq.com>
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
Signed-off-by: Mohit Sinha <msinha@habana.ai>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: KavioYu <67678385+yukavio@users.noreply.github.com>
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com>
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
Co-authored-by: u4lr451 <u4lr451@gmail.com>
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
Co-authored-by: Yijie Zhu <762412795@qq.com>
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>
Co-authored-by: Charles Chen <pychen96@gmail.com>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: shangmingc <caishangming@linux.alibaba.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: linzhuo <15313137931lz@gmail.com>
Co-authored-by: ch-tiger1 <tiger@ch-tech.ip-ddns.com>
Co-authored-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com>
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com>
Co-authored-by: Atream <80757050+Atream@users.noreply.github.com>
Co-authored-by: Li Hui <lambert80.ios@gmail.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: woodx <124784234+woodx9@users.noreply.github.com>
Co-authored-by: Ata Fatahi <immrata@gmail.com>
Co-authored-by: strgrb <zhangkaihong.zkh@antgroup.com>
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
Co-authored-by: Wenbo Yang <solrex@users.noreply.github.com>
Co-authored-by: Chang Su <csu272@usc.edu>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Keyang Ru <rukeyang@gmail.com>
Co-authored-by: ehuaa <ehuamail@163.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Trevor Morris <tmorris@nvidia.com>
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: xutizhou <xutingz@nvidia.com>
Co-authored-by: TianQiLin666666 <1834987979@qq.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yuhong Guo <guoyuhong1985@outlook.com>
Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>
Co-authored-by: Alex Sun <alex.s@amd.com>
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com>
Co-authored-by: Francis <38564764+ssssnow@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: xianzhiT <xianzhitang@tencent.com>
Co-authored-by: yilian49 <43861414+yilian49@users.noreply.github.com>
Co-authored-by: DangKai <dangkai4u@outlook.com>
Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: ll819214 <18801269230@163.com>
Co-authored-by: Li Junwen <lijunwen13@hisilicon.com>
Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com>
Co-authored-by: Hongbo Xu <1320612015@qq.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
Co-authored-by: eigen <52445717+yyihuang@users.noreply.github.com>
Co-authored-by: mlmz <54172054+minleminzui@users.noreply.github.com>
Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>
Co-authored-by: Meng, Peng <pengmeng@tencent.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: tarinkk <129432511+tarinkk@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>
Co-authored-by: Sheng Qi <shengqi2018@pku.edu.cn>
Co-authored-by: finetune <82650881+finetunej@users.noreply.github.com>
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: narutolhy <582909902@qq.com>
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: Shenggui Li <somerlee.9@gmail.com>
Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com>
Co-authored-by: Simon_CQK <cqk0100@gmail.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: 弋云 <yiyun.wyt@antgroup.com>
Co-authored-by: walker-ai <2398833647@qq.com>
Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>
Co-authored-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com>
Co-authored-by: Albert <albert.zty@antgroup.com>
Co-authored-by: Ziming Huang <1520787127@qq.com>
Co-authored-by: ayrnb <70835312+ayrnb@users.noreply.github.com>
Co-authored-by: HydraQYH <QYH820@Outlook.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: VDV1985 <vladdv85@mail.ru>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: TianyuZhang1214 <tianyuzhang1214@163.com>
Co-authored-by: alpha-baby <fujianhao1997@qq.com>
Co-authored-by: Yuchen Cheng <rudeigerc@gmail.com>
Co-authored-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Caproni <40862361+Capronir@users.noreply.github.com>
Co-authored-by: Ximingwang-09 <72070413+Ximingwang-09@users.noreply.github.com>
Co-authored-by: 纬杭 <ximing.wxm@antgroup.com>
Co-authored-by: zyksir <zyksir@outlook.com>
Co-authored-by: SijiaYang <yangsijia.614@bytedance.com>
Co-authored-by: yicwang <yichen.wang@bytedance.com>
Co-authored-by: Leng Yue <lengyue@lengyue.me>
Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>
Co-authored-by: Gang Chen <13298548+MoonBall@users.noreply.github.com>
Co-authored-by: Pranjal Shankhdhar <pranjal.ssh@gmail.com>
Co-authored-by: jay <jthakur@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants