[Roadmap] veRL Development Roadmap

## Themes
We categorized our roadmap into 8 themes: Broad Model Support, Regular Update, More RL Algorithms support, Dataset Coverage, Plugin Support, Scaling Up RL, More LLM Infrastructure Support, Wide Hardware Coverage

## Broad Model Support
To add a new model in veRL, the model should satisfy the following requirements:
1. The models are supported in vLLM and huggingface transformers. Then you can directly use `dummy_hf` load format to run the new model
2. [Optional for DTensor] For FSDP Backend, implement the `dtensor_weight_loader` for the model to transfer actor weights from FSDP checkpoint to vLLM model. See [FSDP Document](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html) for more information
3. For Megatron Backend, users need to implement the `ParallelModel` similar to [modeling_llama_megatron.py](https://github.com/volcengine/verl/blob/main/verl/models/llama/megatron/modeling_llama_megatron.py) , implement some corresponding [checkpoint_utils](https://github.com/volcengine/verl/tree/main/verl/models/llama/megatron/checkpoint_utils) to load checkpoints from the huggingface, and implement the [megatron_weight_loader](https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py) to transfer actor weights from ParallelModel directly to the vLLM model. See [Megatron-LM Document](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html) for more information

## Regular Update
- [x] Use `postition_ids`to support remove padding in transformers models (transformers >= v4.45) https://github.com/volcengine/verl/pull/91
- [x] Upgrade the vLLM version to the latest -> Integrate SPMD-version of vLLM
- [x] Ray upgrade to latest version (test multiple `resource_pool` colocate) https://github.com/volcengine/verl/pull/65
    - [ ] An Megatron Example for multiple WorkerGroup on same `resource_pool`.
- [x] Megatron-LM/MCore Upgrade and GPTModel Support https://github.com/volcengine/verl/issues/15

## More RL Algorithms Support
Make sure the algorithms can converge on some math datasets (e.g., GSM8k)
- [x] GRPO
- [x] Online DPO
- [ ] Safe-RLHF (Multiple rm)
- [x] ReMax

## Dataset Coverage
- [ ] APPS (Code Generation)
- [ ] codecontests (Code Generation)
- [ ] TACO (Code Generation)
- [ ] Math-Shepherd (Math)
- [ ] competition_math (Math)

## Plugin Support
- [x] Integrate [SandBox](https://github.com/bytedance/SandboxFusion/tree/main) and its corresponding datasets for Code Generation tasks

## Scaling up RL
- [x] Context Parallel
    - [x] Deepspeed Ulyssess https://github.com/volcengine/verl/pull/109
    - [ ] Ring Attention
- [ ] Integrate Ray Compiled Graphs (aDAGs) to speedup data transfer
- [x] Support FSDP HybridShard
- [x] Aggressive offload techniques for all models
- [ ] Support vLLM Rollout utilizes larger TP size than Actor model
- [ ] Support Pipeline parallelism in rollout generation (in vllm or other LLM serving infra)

## More LLM Infrastructure Support

### LLM Training Infrastructure
- [ ] Support TorchTitan for TP + PP parallelism
- [ ] Support VeScale for Auto-Parallelism training

### LLM Serving Infrastructure
At present, our project supports vLLM using the SPMD execution paradigm. This means we've eliminated the need for a standalone single-controller process (known as `LLMEngine`) by integrating its functionality directly into the multiple worker processes, making the system SPMD. 
- Basic Tutorial: https://github.com/volcengine/verl/issues/21
- [x] Support SGLang (offline + SPMD) for rollout generation. Reference: https://github.com/sgl-project/sglang/issues/2736
- [x] Support vLLM-SPMD version: https://github.com/volcengine/verl/pull/209
- [ ] Support TensorRT-LLM for rollout generation
  
## Wide Hardware Coverage
Supporting a new hardware type in our project involves the following requirements:
1. **Ray compatibility**: The hardware type must be supported by the Ray framework, allowing it to be recognized and managed through the `ray.utils.placement_group` functionality. 
2. **LLM infra and transformers support:** To leverage the new hardware effectively, it is crucial that both LLM infra (e.g., vLLM, torch, Megatron-LM and others) and the transformers library provide native support for the hardware type.
3. **CUDA kernel replacement:** We need to replace the CUDA kernels currently used in FSDP and Megatron-LM with the corresponding kernels specific to the new hardware.

- [x] Support Ascend NPUs
    - [ ] vLLM Ascend Support https://github.com/vllm-project/vllm/issues/6728
    - [ ]  Megatron-LM -> MindSpeed
- [ ] Low-end NVIDIA GPUs (e.g., Volta, Tesla series)
	- [ ] For Megatron-LM, implement no-rmpad and no flash-attention version of `ParallelModel` https://github.com/volcengine/verl/issues/20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap] veRL Development Roadmap #22

Themes

Broad Model Support

Regular Update

More RL Algorithms Support

Dataset Coverage

Plugin Support

Scaling up RL

More LLM Infrastructure Support

LLM Training Infrastructure

LLM Serving Infrastructure

Wide Hardware Coverage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] veRL Development Roadmap #22

Description

Themes

Broad Model Support

Regular Update

More RL Algorithms Support

Dataset Coverage

Plugin Support

Scaling up RL

More LLM Infrastructure Support

LLM Training Infrastructure

LLM Serving Infrastructure

Wide Hardware Coverage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions