-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Description
- Support normal DeepEP buffer @liz-badada [Feature] Integrate DeepEP into SGLang #4232
- Support DeepEP with async transfer @fzyzcjy Support async in DeepEP #4610
- Support low-latency DeepEP buffer
- Single-node TP @liz-badada [Feature] Support DeepEP Low Latency #4767
- MaskedDeepGeMM is implemented by @laixinn @sleepcoo
- Improved by @yuleil [DeepEP] Reduce routed scaling overhead #5277
- Multi-node TP @liz-badada [Fix] DeepEP Compatibility with Low Latency #5068
- Support PD disaggregation @ch-wan Integrating PD disaggregation with DP attention and DeepEP #5435
- Single-node TP @liz-badada [Feature] Support DeepEP Low Latency #4767
- Integrate pplx-kernels @ruizhang1230 [Feature] integrate pplx-kernels #5010
- Optimize permutation overhead
- Implement Titon kernels @xutizhou Optimize Permute Kernel in DeepEP #4643
- Fuse permutation with GroupedGeMM
- Extend parallelism paradigm
- Extend DeepEP to a general TP paradigm @ch-wan @tarinkk Support (1 <= dp < tp) in the dp attention in DeepEP #4770
- Support
tp_size < ep_size
- Overlap two batches @fzyzcjy Support overlapping two batches #4068
- Integrate continuous DeepGeMM @sleepcoo @xutizhou DeepEP normal support deepgemm-contiguous #5626
- Record expert distribution @yuhsuan-t Add endpoints to dump selected expert ids #4435
- Overlap communication with shared experts’ computation @liz-badada [Feature] Overlap DeepEP Combine and Shared Experts inside same batch #5829
- Integrate EPLB @fzyzcjy EPLB #5295
Others
- The DeepSeek team is going to release a permutation kernel shortly. We may need to check their update How to use the output of DeepEP as the input of DeepGemm deepseek-ai/DeepGEMM#57 (comment)
zhyncs, Swipe4057, hebiao064, Edenzzzz, lambert0312 and 25 moreHaiShaw, xutizhou, WineChord, laixinn, Yi-sir and 4 moreZJLi2013, Yi-sir, Xiaofei-fei, austin362667, 651961 and 1 more