-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Description
Themes
We categorized our roadmap into 8 themes: Broad Model Support, Regular Update, More RL Algorithms support, Dataset Coverage, Plugin Support, Scaling Up RL, More LLM Infrastructure Support, Wide Hardware Coverage
Broad Model Support
To add a new model in veRL, the model should satisfy the following requirements:
- The models are supported in vLLM and huggingface transformers. Then you can directly use
dummy_hf
load format to run the new model - [Optional for DTensor] For FSDP Backend, implement the
dtensor_weight_loader
for the model to transfer actor weights from FSDP checkpoint to vLLM model. See FSDP Document for more information - For Megatron Backend, users need to implement the
ParallelModel
similar to modeling_llama_megatron.py , implement some corresponding checkpoint_utils to load checkpoints from the huggingface, and implement the megatron_weight_loader to transfer actor weights from ParallelModel directly to the vLLM model. See Megatron-LM Document for more information
Regular Update
- Use
postition_ids
to support remove padding in transformers models (transformers >= v4.45) [misc] feat: spport rmpad/data-packing in FSDP with transformers #91 - Upgrade the vLLM version to the latest -> Integrate SPMD-version of vLLM
- Ray upgrade to latest version (test multiple
resource_pool
colocate) [misc] fix: weak reference of WorkerDict in RayTrainer #65- An Megatron Example for multiple WorkerGroup on same
resource_pool
.
- An Megatron Example for multiple WorkerGroup on same
- Megatron-LM/MCore Upgrade and GPTModel Support [RFC] Megatron-LM and MCore maintaining issues for veRL #15
More RL Algorithms Support
Make sure the algorithms can converge on some math datasets (e.g., GSM8k)
- GRPO
- Online DPO
- Safe-RLHF (Multiple rm)
- ReMax
Dataset Coverage
- APPS (Code Generation)
- codecontests (Code Generation)
- TACO (Code Generation)
- Math-Shepherd (Math)
- competition_math (Math)
Plugin Support
- Integrate SandBox and its corresponding datasets for Code Generation tasks
Scaling up RL
- Context Parallel
- Deepspeed Ulyssess [misc][Long Context] feat: support ulysses for long context training #109
- Ring Attention
- Integrate Ray Compiled Graphs (aDAGs) to speedup data transfer
- Support FSDP HybridShard
- Aggressive offload techniques for all models
- Support vLLM Rollout utilizes larger TP size than Actor model
- Support Pipeline parallelism in rollout generation (in vllm or other LLM serving infra)
More LLM Infrastructure Support
LLM Training Infrastructure
- Support TorchTitan for TP + PP parallelism
- Support VeScale for Auto-Parallelism training
LLM Serving Infrastructure
At present, our project supports vLLM using the SPMD execution paradigm. This means we've eliminated the need for a standalone single-controller process (known as LLMEngine
) by integrating its functionality directly into the multiple worker processes, making the system SPMD.
- Basic Tutorial: Basic Tutorial: Adding a New LLM Inference/Serving Backend #21
- Support SGLang (offline + SPMD) for rollout generation. Reference: [Feature] several features for veRL integration sgl-project/sglang#2736
- Support vLLM-SPMD version: [testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version) #209
- Support TensorRT-LLM for rollout generation
Wide Hardware Coverage
Supporting a new hardware type in our project involves the following requirements:
- Ray compatibility: The hardware type must be supported by the Ray framework, allowing it to be recognized and managed through the
ray.utils.placement_group
functionality. - LLM infra and transformers support: To leverage the new hardware effectively, it is crucial that both LLM infra (e.g., vLLM, torch, Megatron-LM and others) and the transformers library provide native support for the hardware type.
- CUDA kernel replacement: We need to replace the CUDA kernels currently used in FSDP and Megatron-LM with the corresponding kernels specific to the new hardware.
- Support Ascend NPUs
- vLLM Ascend Support [Feature]: vllm support for Ascend NPU vllm-project/vllm#6728
- Megatron-LM -> MindSpeed
- Low-end NVIDIA GPUs (e.g., Volta, Tesla series)
- For Megatron-LM, implement no-rmpad and no flash-attention version of
ParallelModel
Is non-RmPad version model and RmPad verison mdoel interchangeable? #20
- For Megatron-LM, implement no-rmpad and no flash-attention version of
eric-haibin-lin, WillemJiang, hongpeng-guo, ji-huazhong, rbao2018 and 12 more
Metadata
Metadata
Assignees
Labels
No labels