-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Description
veRL Megatron-core Development Tracking
This page focuses on development of verl+mcore.
The milestone target is to enable training deepseek-v3 on veRL as #708 and the further target is to continuously enhance the verl training experience of the mcore backend.
Progress and TODO
Recent
- update mcore version to 0.11 megatron:Update megatron-lm to
core_r0.11.0
#392 - use mcore
GPTModel
api instead of huggingface workaround with sequence packing Use Mcore GPTModel #706 - support context parallel [Mcore] context parallel #970
- support loading mcore dist_checkpointing [mcore] option to use dist checkpoint #1030
- support Megatron 0.11.0 and vLLM 0.8.2 Support Megatron 0.11.0 and vLLM 0.8.2, update images to use latest vllm and Megatron #851
- support qwen2moe training [mcore] qwen2moe support #1139
- support
Moonlight-16B-A3B
training (WIP) [mcore] moonlight (small model with deepseekv3 arch) #1284 - support
Qwen2.5-VL
training [megatron] feat: qwen2.5vl #1286 - support EP(expert parallel) [megatron] support megatron expert parallel #1467
Further
- FP8 training
- training efficiency related optimization
- support sglang inference engine
- support trtllm inference engine
haolin-nju, vermouth1992, zionwu, hongpeng-guo, wuxibin89 and 6 more651961YangWang92, BearBiscuit05, GeLee-Q, ccclyu, eric-haibin-lin and 6 more651961uygnef, 651961 and linxxx3