[roadmap] verl Q3 development

Past roadmap dicusssions for reference: https://github.com/volcengine/verl/issues/710 https://github.com/volcengine/verl/issues/22

The most important thing for verl Q3 is to make it a modular **foundational library** for the community to extend, as a **starting point** but not the destination.

# composable model engines
Finish up https://github.com/volcengine/verl/discussions/1560 such that parallelism strategy is not implemented at the engine level, without exposing details to the worker(role) level. The fsdp/megatron engines are expected to be created and run in a standalone fashion, and be reused across different roles.
- [ ] fsdp actor, critic, ref (focus on fsdp2)
- [ ] megatron actor, critic, ref
- [ ] torchtitan integration (call for contribution)
- [ ] switch all recipe/examples from fsdp1 to fsdp2 by default (and remove ill-maintained ones)

Work in progress interface for comments https://github.com/volcengine/verl/pull/1977

# rollout workers
- [ ] optimize server mode rollout performance
- [ ] modular rollout workers: VllmRolloutWorker and SGLangRolloutWorker, exposing the same APIs
- [ ] support model with random init weight
- [ ] weight resharding: optimize tp x dp dispatch, and support receiving weight from separate resource groups
- [ ] Agent RL infrastructure https://github.com/volcengine/verl/issues/2618 

Additional ongoing efforts: 
- https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/131
- https://github.com/volcengine/verl/issues/1882

# async & disaggregated architecture
- [x] one-step off async pipeline (WIP: https://github.com/volcengine/verl/pull/2231), further performance optimization & profiling needed
- [ ] streaming/partial rollout (WIP: https://github.com/volcengine/verl/pull/2200)
- [ ] performance tuning, and reference throughput benchmark across [model type, model size, seqlen, hardware, num accelerators, worker role] to achieve better disaggregated resource allocation
- [ ] fully-async pipeline

# multi-turn, data, config infra
- [ ] better message infra for multi-turn messages, dense reward @SwordFaith
- [ ] better dataset schema for train & rollout. We need documentation too. TRL's documentation is good https://huggingface.co/docs/trl/en/dataset_formats @SwordFaith
- [ ] use tensordict and nested-tensor to remove padding and replace DataProto
- [ ] replace omegaConfig with read-only dataclass for verl internal config passing https://github.com/volcengine/verl/pull/2379 https://github.com/volcengine/verl/pull/2147/files and make unit test easier
- [ ] P1: distributed data pool from https://arxiv.org/pdf/2507.01663v1 https://github.com/volcengine/verl/issues/2539 

# streamline new model workflow
- [ ] document the workflow to add a new hf model to verl. Currently with latest vllm there's no need to add weight loader mentioned in https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html
- [ ] better abstraction and registration system for multi-modal models. Currently different multi-modals have inconsistent config attr (e.g. rope), freeze/unfreeze setup, input/output processing... (ideally this should be done at huggingface transformers level but it's not sufficient right now cc @NielsRogge) (RFC needed)
- [ ] verl needs a documentation page about the latest status of model support and per model related features (lora, sequence parallelism, megatron, etc)

# high quality recipes and end2end optimizations
- [x] retool recipe (code is ready, going through reviews)
- [ ] SOTA multimodal vlm RL recipe (call for contribution)
- [ ] enhance DAPO recipe with larger models, and provide scripts with high training throughput (many perf knobs are not turned on in the current script)
- we welcome more recipes from the community, please open an RFC if you're interested in contributing before opening any PR for recipes https://github.com/volcengine/verl/issues/2136 

Additional existing ongoing features:
- https://github.com/volcengine/verl/issues/1033
- https://github.com/volcengine/verl/discussions/2171

> Many roadmap tasks in this doc are initiated by & credit to @vermouth1992 @SwordFaith


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[roadmap] verl Q3 development #2388

composable model engines

rollout workers

async & disaggregated architecture

multi-turn, data, config infra

streamline new model workflow

high quality recipes and end2end optimizations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[roadmap] verl Q3 development #2388

Description

composable model engines

rollout workers

async & disaggregated architecture

multi-turn, data, config infra

streamline new model workflow

high quality recipes and end2end optimizations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions