-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Description
Past roadmap dicusssions for reference: #710 #22
The most important thing for verl Q3 is to make it a modular foundational library for the community to extend, as a starting point but not the destination.
composable model engines
Finish up #1560 such that parallelism strategy is not implemented at the engine level, without exposing details to the worker(role) level. The fsdp/megatron engines are expected to be created and run in a standalone fashion, and be reused across different roles.
- fsdp actor, critic, ref (focus on fsdp2)
- megatron actor, critic, ref
- torchtitan integration (call for contribution)
- switch all recipe/examples from fsdp1 to fsdp2 by default (and remove ill-maintained ones)
Work in progress interface for comments #1977
rollout workers
- optimize server mode rollout performance
- modular rollout workers: VllmRolloutWorker and SGLangRolloutWorker, exposing the same APIs
- support model with random init weight
- weight resharding: optimize tp x dp dispatch, and support receiving weight from separate resource groups
- Agent RL infrastructure [agentic RL] multi-turn rollout and agent loop development tracking #2618
Additional ongoing efforts:
- Multi-turn rollout & agentic RL Status & Roadmap zhaochenyang20/Awesome-ML-SYS-Tutorial#131
- [roadmap] Rollout Module Development Progress & Roadmap #1882
async & disaggregated architecture
- one-step off async pipeline (WIP: [trainer, fsdp, vllm, recipe] feat: one step off async training recipe #2231), further performance optimization & profiling needed
- streaming/partial rollout (WIP: [rollout] feat: support reorder rollout for tackling long-tail generation problem #2200)
- performance tuning, and reference throughput benchmark across [model type, model size, seqlen, hardware, num accelerators, worker role] to achieve better disaggregated resource allocation
- fully-async pipeline
multi-turn, data, config infra
- better message infra for multi-turn messages, dense reward @SwordFaith
- better dataset schema for train & rollout. We need documentation too. TRL's documentation is good https://huggingface.co/docs/trl/en/dataset_formats @SwordFaith
- use tensordict and nested-tensor to remove padding and replace DataProto
- replace omegaConfig with read-only dataclass for verl internal config passing [cfg] refactor: make actor config more modular #2379 https://github.com/volcengine/verl/pull/2147/files and make unit test easier
- P1: distributed data pool from https://arxiv.org/pdf/2507.01663v1 [RFC] Add persistable replay buffer for large-scale rollout data storage #2539
streamline new model workflow
- document the workflow to add a new hf model to verl. Currently with latest vllm there's no need to add weight loader mentioned in https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html
- better abstraction and registration system for multi-modal models. Currently different multi-modals have inconsistent config attr (e.g. rope), freeze/unfreeze setup, input/output processing... (ideally this should be done at huggingface transformers level but it's not sufficient right now cc @NielsRogge) (RFC needed)
- verl needs a documentation page about the latest status of model support and per model related features (lora, sequence parallelism, megatron, etc)
high quality recipes and end2end optimizations
- retool recipe (code is ready, going through reviews)
- SOTA multimodal vlm RL recipe (call for contribution)
- enhance DAPO recipe with larger models, and provide scripts with high training throughput (many perf knobs are not turned on in the current script)
- we welcome more recipes from the community, please open an RFC if you're interested in contributing before opening any PR for recipes [RFC] Verl recipe for image generation model #2136
Additional existing ongoing features:
- [mcore] verl+megatron development tracking #1033
- Features that npu will focus on supporting in Q3 #2171
Many roadmap tasks in this doc are initiated by & credit to @vermouth1992 @SwordFaith
FightingZhen, Yangruipis, qiangzhou7, lebronjamesking, zhanjiqing and 11 moreFightingZhen, ji-huazhong, Yangruipis, Aurelius84, donglixp and 5 more