-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Description
The verl team like to invite the community to help contribute to build infrastructure to efficiently scale up to models of deepseek scale and develop recipes to reproduce deepseek r1 results for the broader open source and research community. You're encouraged to join the slack channel for discussions on sub-topics.
Evals
Tasks: https://github.com/volcengine/verl/pull/777/files
- add evaluation script to reproduce ds-r1 results on several benchmarks from ds-r1 checkpoint
- gpqa diamond (english)
- LiveCodeBench (code)
- SWE-bench Verified (code)
- CNMO 2024 (math)
Notes:
- refer to
examples/data_preprocess
for data preprocessing examples - refer to trainer/main_generation.py for generation (but evaluation code is missing)
- if possible, upload preprocessed dataset to huggingface
- start with ds distilled models for quick verification before scaling up to 671b
Training engine
Tasks:
- verify
GPTModel
with mcore v0.11. There’s experimental integration of critic and actor models with GPTModel class- verify checkpoint manager with GPTModel
- Investigate the issue of convergence issue of seq packing when micro bsz > 1 @GAIR-NLP
Mcore ds-v3 perf tuning - Run ds-v3 with mcore on a range of seqlen and GPU count, tune nd parallel / memory configs
- Mcore ds-v3 convergence verification: to ensure the correctness of mcore ds-v3 implementation, we shall run a smaller version of ds with same model architecture.
- obtain a medium-sized MOE pretrained ckpt
- Compare mcore convergence with FSDP + medium-sized MOE
- Verify mcore expert parallelism correctness [megatron] support megatron expert parallel #1467
- Optimize ds-v3 long context throughout with FSDP backend for faster FSDP experiment iterations (e.g. liger alignment loss, recompute strategy, ring attn).
- Run 671b Add DeepSeek 671B GRPO example #1771
Notes:
- To make the perf tuning more faithful, it’s better to develop a benchmark script that include alignment loss (CE loss, CE and entropy loss)
- GAIR NLP team is looking GPTModel with sequence packing support when micro_bsz > 1
Data & recipe
Tasks:
Math and scientific sources:
- Further improve the dataset used for math RL, maybe based on DAPO 17k and other open source ones
Code: - Build a baseline code recipe in verl main repo, using small models such as llama or qwen-7b
- Curate dataset for code RL training (start with existing open source ones)
Notes:
- Provide reproducible command and logs in https://verl.readthedocs.io/en/latest/experiment/ppo.html
Inference Engine
Tasks
- verify multi-node TP inference
- support multi-node EP/PP inference
- sharding manager support with mcore v0.11 + latest version of inference engines
Related TODOs: #825
YangWang92, Aurelius84, qiuhuida, hongpeng-guo, sutyum and 6 morenone0663, jinqinn, Aurelius84, uygnef, hongpeng-guo and 7 more