v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations
v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations
Key changes
PPO fixes and enhancements
- Fixed a bug related to vf_loss coefficient for PPO, which was introduced in v0.4 #2016
- Improved numerical stability when clamping KL divergence-related values #1779
Checkpoints related
- Switched Megatron checkpointer to mcore's dist_checkpoint, which reduces peak memory usage and improves distributed model saving performance via
*.checkpoint.async_save=True
. - [BREAKING] Megatron's checkpoint directory layout is updated accordingly. Documentation
- [BREAKING] Checkpoint manager constructor now takes
checkpoint_config
as the keyword to replacecheckpoint_contents
#2125 - Checkpoint merger for LoRA is fixed #1821 via
python -m verl.model_merger merge ...
. Documentation
Experimental function calling & MCP interfaces
These features are experimental and subject to changes in the future
- Chat completion scheduler now speaks the OpenAI function-calling schema with an OpenAI server #1831
- SGLang rollout with MCP client #1948 Documentation
- SGLang multi-turn rollout code walk-through documentation
- Multi-turn interaction system with SGLang, enabling dynamic conversational feedback and iterative problem-solving scenarios #1630, the building block for SCoRe
New models and recipes
- New recipe/entropy to reproduce the paper The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning with
Clip-Cov
andKL-Cov
methods - Megatron support for Qwen-2.5-VL #1286
- Multi-turn SFT support for Qwen-3 #1889
- Enhanced kimi-vl with sequence parallelism #1899
SGLang optimizations
- rollout with SGLang memory usage is further optimized. Blog (requires sglang v0.4.8 #2187)
- async multi-turn rollout with multi-modal support now available in SGLang #2014
Other performance profiling & optimizations
- Nsight system profiling is available. Documentation
- FSDP prefetch can be enabled via
[actor|ref].fsdp_config.forward_prefetch=True
#1927 - The memory usage for entropy computation can be drastically reduced with fused kernels using
[actor|ref].entropy_checkpointing=True
and[actor|ref].entropy_from_logits_with_chunking=True
#1927
Other breaking changes and deprecations
- See #1902
- vllm v0.6.3 support will be removed in the next release.
What's Changed
- [feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding by @ETOgaosion in #1834
- [rollout] feat: follow OpenAI tool calling schema in chat scheduler by @wuxibin89 in #1831
- [release] chore: bump version to v0.4 by @eric-haibin-lin in #1897
- Dockerfile.rocm update tensordict==0.6.2 by @vickytsang in #1898
- [feat] add validation shuffle by @mlpod in #1886
- [feat][BREAKING] Megatron: Support learning rate scheduler by @ETOgaosion in #1701
- fix errors in megatron_workers.py by @davidjsonn in #1906
- [tests] chore: add PR title check by @eric-haibin-lin in #1901
- fix qwen2vl grpo for vllm 0.9 and transformers 4.52 by @hiyouga in #1880
- [rollout] fix: error in __collect_lora_params() in FSDPVLLMShardingManager by @rocke2020 in #1909
- [recipe] feat: char count by @vermouth1992 in #1908
- fix typos by @davidjsonn in #1912
- [trainer] refactor: refactor reward manager, advantage estimator by @eric-haibin-lin in #1916
- set CUDA and HIP VISIBLE DEVICES by @YangWang92 in #1914
- [ppo] feat: add critic valuehead model support for multi-modal PPO by @Yangruipis in #1839
- [bugfix] fix megatron model merger by @ShareLer in #1774
- revert HIP_VISIBLE_DEVICES in worker.py by @YangWang92 in #1920
- [worker] fix: do not break dynamic bsz in dp critic by @hiyouga in #1922
- [sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking by @jybsuper in #1668
- [rollout] fix: fix async llm config passing by @eric-haibin-lin in #1933
- [sglang] fix: Fix tool call parser not found error for SGLang==0.4.6.post5 by @jybsuper in #1852
- fix sequence parallelism conflict in kimiVL by @ShareLer in #1899
- [megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 by @jinqinn in #1836
- [rollout] feat: add async llm perf script by @wuxibin89 in #1930
- [megatron] feat: qwen2.5vl by @ISEEKYAN in #1286
- [ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters by @thelongestusernameofall in #1821
- [hardware] fix: fix issue when sp>1 on ASCEND NPU by @as12138 in #1942
- [megatron] fix: rope_type typo in config_converter.py by @donpromax in #1944
- [training_utils] Add qwen3 multi-turn sft support by @SwordFaith in #1889
- [fsdp] fix: fsdp entropy metrics by @ETOgaosion in #1943
- [FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy by @CurryRice233 in #1927
- [rollout] fix: set repetition_penalty=1.0 to AsyncLLM by @wuxibin89 in #1949
- [fsdp] feat: Memory efficient cross entropy with a linear layer fused by @Jianbing-D in #462
- [recipe] feat: qwen2.5vl 7b report and guide by @ISEEKYAN in #1969
- [ckpt] refactor: enhance FSDP checkpoint manager flexibility by @0x404 in #1350
- [env] fix: npu ray verion to 2.46.0 for CI problem by @wyz649296016 in #1987
- Fix TypeError by Removing Duplicate Arguments in run_deepseek671b_math_megatron.sh by @none0663 in #1996
- [megatron] feat: Config NCCL Timeout for Megatron Backend Model Loading by @none0663 in #1983
- [tests] chore: ppo workflow runs on volcengine machine learning platform by @htc070011 in #1979
- [megatron] fix: multiple key error when trying to override megatron tr… by @donpromax in #1990
- [megatron] feat: robust and efficient mcore converter with meta device init and numel check for dpsk by @Yangruipis in #1995
- Stabilize loss calculations by clamping KL divergence values by @syo093c in #1779
- [ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError by @lxg2015 in #2010
- [algo] fix:
vf_loss
factor by @tongyx361 in #2016 - [data] fix: fix retool sft data source by @vermouth1992 in #2018
- [fsdp] fix: position_ids in qwen-vl by @ShareLer in #1947
- [hardware] refactor: refactor part of device management by @FightingZhen in #1974
- [trainer] fix: fix sft max_position_embeddings by @vermouth1992 in #2019
- [misc] fix: fix format by @vermouth1992 in #2023
- [megatron] fix: dpskv3 convert src and dst mixed up bug by @Yangruipis in #2029
- fix: TensorDict usage error by @zhihe-wang in #2046
- [hardware] feat: support qwen2_5_vl on ASCEND NPU by @as12138 in #1924
- [trainer] chore: Reducing the number of calls to the write by @RuixiangMa in #2043
- [Bug] fix
None
check in DataProto print_size() by @GHGmc2 in #2067 - [perf] feat: Add verl profiling support from Nvidia Nsight System by @davidmlw in #1820
- [data] fix: multimodal overlong prompt length filtering by @dirtyDan0 in #2063
- [sglang] fix: AsyncSglangServer use async wake_up/sleep by @feifeibear in #2062
- [training_utils] feat: Add project and experiment name to tensorboard log path by @Geaming2002 in #2080
- [trainer] fix: Fix trainer config for
val_only
by @hscspring in https://github.com/volcengine/verl/pull/20842083 - [megatron] fix: fix qwen2_vl on plain-text data and mix data of plain-text and image-text by @MaoChouHJM in #1999
- [vllm] fix: mv disable_mm_preprocessor_cache to vllm engine_kwargs by @yyDing1 in #2068
- [misc] feat: update instruction for running dapo on qwen2.5 7b math and add reference wandb by @vermouth1992 in #2094
- [rollout] refactor: Add option for rollout_log_probs, and default as
False
by @GHGmc2 in #2072 - [tool] feat: Add Search Tool implemented with MCP by @AlecHenx in #1948
- [trainer] fix: make
reward_extra_info
optional inreward_result
by @HollowMan6 in #2109 - [algo] feat: integrate Clip-Cov and KL-Cov methods by @Raf-Chen in #1830
- [rollout] fix: error in sgyang async mode by @chenhaiq in #2098
- [rollout] fix: fix rollout key not found by @ETOgaosion in #2116
- [recipe] feat: Move entropy reward to the entropy recipe by @Raf-Chen in #2118
- [cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerProfiler to DistProfiler, add unit test based on ProfilerConfig by @eric-haibin-lin in #2117
- [worker] feat: add support for dynamic batch size of multimodal data by @wang-zerui in #2049
- [fsdp] refactor: set actor's strategy as default for critic and ref by @0x404 in #2130
- [ray] feat: add a test to demonstrate how to perform p2p communication inside wor… by @vermouth1992 in #2131
- [sglang] feat: Support async multi-turn rollout with simulation feedback in sglang by @kinza99 in #1630
- [tool] feat: Add memory limit configuration for sandbox fusion by @plutoZZZZ in #2105
- [sglang] feat: add multimodal input to multiturn async rollout by @nanjiangwill in #2014
- [fsdp] feat: support fsdp2 save hugging face model by @0x404 in #2138
- [rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True by @zyfzjsc988 in #2156
- [rollout] feat: Support Multi-stage Awake for SGLang by @hebiao064 in #1911
- [worker] feat: allow dist shared file-system initialization by @Cccei000 in #2154
- [model] feat: Add MiniCPM-o 2.6 support by @RanchiZhao in #1833
- [model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" by @hiyouga in #2176
- [misc] fix: fix timer importance error in split_placement by @FightingZhen in #2169
- [megatron,vllm] fix: megatron vllm async rollout server by @Yangruipis in #2122
- [model] feat: Add MiniCPM-o 2.6 support by @hiyouga in #2178
- [megatron] feat: Support of dist checkpoint by @ETOgaosion in #2125
- [data] fix: fix the type of parquet_files in SFTDataset by @xuuHuang in #2203
- [trainer] fix: add missing qwen2_moe flops counter by @ETOgaosion in #2190
- [trainer] fix: Add init.py to verl.trainer.config by @ultmaster in #2214
- [model] fix: make vlm patch forward compatible by @hiyouga in #2215
- [recipe] fix: parameter order in RayPRIMETrainer super().init() call by @xxnpark in #2172
- [misc] feat: support ValidationGenerationsLogger in vemlp_wandb by @chenhaiq in #2191
New Contributors
Thank you all for joining this project!
@vickytsang @davidjsonn @rocke2020 @vwxyzjn @Yangruipis @SeungyounShin @donpromax @leopardracer @ZhiyuLi-Nvidia @LiyuanLucasLiu @Jianbing-D @wyz649296016 @htc070011 @syo093c @FightingZhen @zhihe-wang @KaiChen1998 @wizeng23 @RuixiangMa @davidmlw @feifeibear @hscspring @MaoChouHJM @AlecHenx @wang-zerui @kinza99 @nanjiangwill @zyfzjsc988 @Cccei000 @RanchiZhao @xuuHuang @ultmaster @xxnpark @jvmncs @xingyunjohn1
Full Changelog: v0.4.0...v0.4.1