v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

@ETOgaosion

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

Key changes

PPO fixes and enhancements

Fixed a bug related to vf_loss coefficient for PPO, which was introduced in v0.4 #2016
Improved numerical stability when clamping KL divergence-related values #1779

Checkpoints related

Switched Megatron checkpointer to mcore's dist_checkpoint, which reduces peak memory usage and improves distributed model saving performance via *.checkpoint.async_save=True.
[BREAKING] Megatron's checkpoint directory layout is updated accordingly. Documentation
[BREAKING] Checkpoint manager constructor now takes checkpoint_config as the keyword to replace checkpoint_contents #2125
Checkpoint merger for LoRA is fixed #1821 via python -m verl.model_merger merge .... Documentation

Experimental function calling & MCP interfaces

These features are experimental and subject to changes in the future

Chat completion scheduler now speaks the OpenAI function-calling schema with an OpenAI server #1831
SGLang rollout with MCP client #1948 Documentation
SGLang multi-turn rollout code walk-through documentation
Multi-turn interaction system with SGLang, enabling dynamic conversational feedback and iterative problem-solving scenarios #1630, the building block for SCoRe

New models and recipes

New recipe/entropy to reproduce the paper The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning with Clip-Cov and KL-Cov methods
Megatron support for Qwen-2.5-VL #1286
Multi-turn SFT support for Qwen-3 #1889
Enhanced kimi-vl with sequence parallelism #1899

SGLang optimizations

rollout with SGLang memory usage is further optimized. Blog (requires sglang v0.4.8 #2187)
async multi-turn rollout with multi-modal support now available in SGLang #2014

Other performance profiling & optimizations

Nsight system profiling is available. Documentation
FSDP prefetch can be enabled via [actor|ref].fsdp_config.forward_prefetch=True #1927
The memory usage for entropy computation can be drastically reduced with fused kernels using [actor|ref].entropy_checkpointing=True and [actor|ref].entropy_from_logits_with_chunking=True #1927

Other breaking changes and deprecations

See #1902
vllm v0.6.3 support will be removed in the next release.

What's Changed

[feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding by @ETOgaosion in #1834
[rollout] feat: follow OpenAI tool calling schema in chat scheduler by @wuxibin89 in #1831
[release] chore: bump version to v0.4 by @eric-haibin-lin in #1897
Dockerfile.rocm update tensordict==0.6.2 by @vickytsang in #1898
[feat] add validation shuffle by @mlpod in #1886
[feat][BREAKING] Megatron: Support learning rate scheduler by @ETOgaosion in #1701
fix errors in megatron_workers.py by @davidjsonn in #1906
[tests] chore: add PR title check by @eric-haibin-lin in #1901
fix qwen2vl grpo for vllm 0.9 and transformers 4.52 by @hiyouga in #1880
[rollout] fix: error in __collect_lora_params() in FSDPVLLMShardingManager by @rocke2020 in #1909
[recipe] feat: char count by @vermouth1992 in #1908
fix typos by @davidjsonn in #1912
[trainer] refactor: refactor reward manager, advantage estimator by @eric-haibin-lin in #1916
set CUDA and HIP VISIBLE DEVICES by @YangWang92 in #1914
[ppo] feat: add critic valuehead model support for multi-modal PPO by @Yangruipis in #1839
[bugfix] fix megatron model merger by @ShareLer in #1774
revert HIP_VISIBLE_DEVICES in worker.py by @YangWang92 in #1920
[worker] fix: do not break dynamic bsz in dp critic by @hiyouga in #1922
[sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking by @jybsuper in #1668
[rollout] fix: fix async llm config passing by @eric-haibin-lin in #1933
[sglang] fix: Fix tool call parser not found error for SGLang==0.4.6.post5 by @jybsuper in #1852
fix sequence parallelism conflict in kimiVL by @ShareLer in #1899
[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 by @jinqinn in #1836
[rollout] feat: add async llm perf script by @wuxibin89 in #1930
[megatron] feat: qwen2.5vl by @ISEEKYAN in #1286
[ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters by @thelongestusernameofall in #1821
[hardware] fix: fix issue when sp>1 on ASCEND NPU by @as12138 in #1942
[megatron] fix: rope_type typo in config_converter.py by @donpromax in #1944
[training_utils] Add qwen3 multi-turn sft support by @SwordFaith in #1889
[fsdp] fix: fsdp entropy metrics by @ETOgaosion in #1943
[FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy by @CurryRice233 in #1927
[rollout] fix: set repetition_penalty=1.0 to AsyncLLM by @wuxibin89 in #1949
[fsdp] feat: Memory efficient cross entropy with a linear layer fused by @Jianbing-D in #462
[recipe] feat: qwen2.5vl 7b report and guide by @ISEEKYAN in #1969
[ckpt] refactor: enhance FSDP checkpoint manager flexibility by @0x404 in #1350
[env] fix: npu ray verion to 2.46.0 for CI problem by @wyz649296016 in #1987
Fix TypeError by Removing Duplicate Arguments in run_deepseek671b_math_megatron.sh by @none0663 in #1996
[megatron] feat: Config NCCL Timeout for Megatron Backend Model Loading by @none0663 in #1983
[tests] chore: ppo workflow runs on volcengine machine learning platform by @htc070011 in #1979
[megatron] fix: multiple key error when trying to override megatron tr… by @donpromax in #1990
[megatron] feat: robust and efficient mcore converter with meta device init and numel check for dpsk by @Yangruipis in #1995
Stabilize loss calculations by clamping KL divergence values by @syo093c in #1779
[ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError by @lxg2015 in #2010
[algo] fix: vf_loss factor by @tongyx361 in #2016
[data] fix: fix retool sft data source by @vermouth1992 in #2018
[fsdp] fix: position_ids in qwen-vl by @ShareLer in #1947
[hardware] refactor: refactor part of device management by @FightingZhen in #1974
[trainer] fix: fix sft max_position_embeddings by @vermouth1992 in #2019
[misc] fix: fix format by @vermouth1992 in #2023
[megatron] fix: dpskv3 convert src and dst mixed up bug by @Yangruipis in #2029
fix: TensorDict usage error by @zhihe-wang in #2046
[hardware] feat: support qwen2_5_vl on ASCEND NPU by @as12138 in #1924
[trainer] chore: Reducing the number of calls to the write by @RuixiangMa in #2043
[Bug] fix None check in DataProto print_size() by @GHGmc2 in #2067
[perf] feat: Add verl profiling support from Nvidia Nsight System by @davidmlw in #1820
[data] fix: multimodal overlong prompt length filtering by @dirtyDan0 in #2063
[sglang] fix: AsyncSglangServer use async wake_up/sleep by @feifeibear in #2062
[training_utils] feat: Add project and experiment name to tensorboard log path by @Geaming2002 in #2080
[trainer] fix: Fix trainer config for val_only by @hscspring in https://github.com/volcengine/verl/pull/20842083
[megatron] fix: fix qwen2_vl on plain-text data and mix data of plain-text and image-text by @MaoChouHJM in #1999
[vllm] fix: mv disable_mm_preprocessor_cache to vllm engine_kwargs by @yyDing1 in #2068
[misc] feat: update instruction for running dapo on qwen2.5 7b math and add reference wandb by @vermouth1992 in #2094
[rollout] refactor: Add option for rollout_log_probs, and default as False by @GHGmc2 in #2072
[tool] feat: Add Search Tool implemented with MCP by @AlecHenx in #1948
[trainer] fix: make reward_extra_info optional in reward_result by @HollowMan6 in #2109
[algo] feat: integrate Clip-Cov and KL-Cov methods by @Raf-Chen in #1830
[rollout] fix: error in sgyang async mode by @chenhaiq in #2098
[rollout] fix: fix rollout key not found by @ETOgaosion in #2116
[recipe] feat: Move entropy reward to the entropy recipe by @Raf-Chen in #2118
[cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerProfiler to DistProfiler, add unit test based on ProfilerConfig by @eric-haibin-lin in #2117
[worker] feat: add support for dynamic batch size of multimodal data by @wang-zerui in #2049
[fsdp] refactor: set actor's strategy as default for critic and ref by @0x404 in #2130
[ray] feat: add a test to demonstrate how to perform p2p communication inside wor… by @vermouth1992 in #2131
[sglang] feat: Support async multi-turn rollout with simulation feedback in sglang by @kinza99 in #1630
[tool] feat: Add memory limit configuration for sandbox fusion by @plutoZZZZ in #2105
[sglang] feat: add multimodal input to multiturn async rollout by @nanjiangwill in #2014
[fsdp] feat: support fsdp2 save hugging face model by @0x404 in #2138
[rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True by @zyfzjsc988 in #2156
[rollout] feat: Support Multi-stage Awake for SGLang by @hebiao064 in #1911
[worker] feat: allow dist shared file-system initialization by @Cccei000 in #2154
[model] feat: Add MiniCPM-o 2.6 support by @RanchiZhao in #1833
[model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" by @hiyouga in #2176
[misc] fix: fix timer importance error in split_placement by @FightingZhen in #2169
[megatron,vllm] fix: megatron vllm async rollout server by @Yangruipis in #2122
[model] feat: Add MiniCPM-o 2.6 support by @hiyouga in #2178
[megatron] feat: Support of dist checkpoint by @ETOgaosion in #2125
[data] fix: fix the type of parquet_files in SFTDataset by @xuuHuang in #2203
[trainer] fix: add missing qwen2_moe flops counter by @ETOgaosion in #2190
[trainer] fix: Add init.py to verl.trainer.config by @ultmaster in #2214
[model] fix: make vlm patch forward compatible by @hiyouga in #2215
[recipe] fix: parameter order in RayPRIMETrainer super().init() call by @xxnpark in #2172
[misc] feat: support ValidationGenerationsLogger in vemlp_wandb by @chenhaiq in #2191

New Contributors

Thank you all for joining this project!

@vickytsang @davidjsonn @rocke2020 @vwxyzjn @Yangruipis @SeungyounShin @donpromax @leopardracer @ZhiyuLi-Nvidia @LiyuanLucasLiu @Jianbing-D @wyz649296016 @htc070011 @syo093c @FightingZhen @zhihe-wang @KaiChen1998 @wizeng23 @RuixiangMa @davidmlw @feifeibear @hscspring @MaoChouHJM @AlecHenx @wang-zerui @kinza99 @nanjiangwill @zyfzjsc988 @Cccei000 @RanchiZhao @xuuHuang @ultmaster @xxnpark @jvmncs @xingyunjohn1

Full Changelog: v0.4.0...v0.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

Key changes

What's Changed

New Contributors

Contributors

Uh oh!