Skip to content

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

Compare
Choose a tag to compare
@eric-haibin-lin eric-haibin-lin released this 27 Jun 00:13
· 391 commits to main since this release

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

Key changes

PPO fixes and enhancements

  • Fixed a bug related to vf_loss coefficient for PPO, which was introduced in v0.4 #2016
  • Improved numerical stability when clamping KL divergence-related values #1779

Checkpoints related

  • Switched Megatron checkpointer to mcore's dist_checkpoint, which reduces peak memory usage and improves distributed model saving performance via *.checkpoint.async_save=True.
  • [BREAKING] Megatron's checkpoint directory layout is updated accordingly. Documentation
  • [BREAKING] Checkpoint manager constructor now takes checkpoint_config as the keyword to replace checkpoint_contents #2125
  • Checkpoint merger for LoRA is fixed #1821 via python -m verl.model_merger merge .... Documentation

Experimental function calling & MCP interfaces

These features are experimental and subject to changes in the future

  • Chat completion scheduler now speaks the OpenAI function-calling schema with an OpenAI server #1831
  • SGLang rollout with MCP client #1948 Documentation
  • SGLang multi-turn rollout code walk-through documentation
  • Multi-turn interaction system with SGLang, enabling dynamic conversational feedback and iterative problem-solving scenarios #1630, the building block for SCoRe

New models and recipes

SGLang optimizations

  • rollout with SGLang memory usage is further optimized. Blog (requires sglang v0.4.8 #2187)
  • async multi-turn rollout with multi-modal support now available in SGLang #2014

Other performance profiling & optimizations

  • Nsight system profiling is available. Documentation
  • FSDP prefetch can be enabled via [actor|ref].fsdp_config.forward_prefetch=True #1927
  • The memory usage for entropy computation can be drastically reduced with fused kernels using [actor|ref].entropy_checkpointing=True and [actor|ref].entropy_from_logits_with_chunking=True #1927

Other breaking changes and deprecations

  • See #1902
  • vllm v0.6.3 support will be removed in the next release.

What's Changed

  • [feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding by @ETOgaosion in #1834
  • [rollout] feat: follow OpenAI tool calling schema in chat scheduler by @wuxibin89 in #1831
  • [release] chore: bump version to v0.4 by @eric-haibin-lin in #1897
  • Dockerfile.rocm update tensordict==0.6.2 by @vickytsang in #1898
  • [feat] add validation shuffle by @mlpod in #1886
  • [feat][BREAKING] Megatron: Support learning rate scheduler by @ETOgaosion in #1701
  • fix errors in megatron_workers.py by @davidjsonn in #1906
  • [tests] chore: add PR title check by @eric-haibin-lin in #1901
  • fix qwen2vl grpo for vllm 0.9 and transformers 4.52 by @hiyouga in #1880
  • [rollout] fix: error in __collect_lora_params() in FSDPVLLMShardingManager by @rocke2020 in #1909
  • [recipe] feat: char count by @vermouth1992 in #1908
  • fix typos by @davidjsonn in #1912
  • [trainer] refactor: refactor reward manager, advantage estimator by @eric-haibin-lin in #1916
  • set CUDA and HIP VISIBLE DEVICES by @YangWang92 in #1914
  • [ppo] feat: add critic valuehead model support for multi-modal PPO by @Yangruipis in #1839
  • [bugfix] fix megatron model merger by @ShareLer in #1774
  • revert HIP_VISIBLE_DEVICES in worker.py by @YangWang92 in #1920
  • [worker] fix: do not break dynamic bsz in dp critic by @hiyouga in #1922
  • [sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking by @jybsuper in #1668
  • [rollout] fix: fix async llm config passing by @eric-haibin-lin in #1933
  • [sglang] fix: Fix tool call parser not found error for SGLang==0.4.6.post5 by @jybsuper in #1852
  • fix sequence parallelism conflict in kimiVL by @ShareLer in #1899
  • [megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 by @jinqinn in #1836
  • [rollout] feat: add async llm perf script by @wuxibin89 in #1930
  • [megatron] feat: qwen2.5vl by @ISEEKYAN in #1286
  • [ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters by @thelongestusernameofall in #1821
  • [hardware] fix: fix issue when sp>1 on ASCEND NPU by @as12138 in #1942
  • [megatron] fix: rope_type typo in config_converter.py by @donpromax in #1944
  • [training_utils] Add qwen3 multi-turn sft support by @SwordFaith in #1889
  • [fsdp] fix: fsdp entropy metrics by @ETOgaosion in #1943
  • [FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy by @CurryRice233 in #1927
  • [rollout] fix: set repetition_penalty=1.0 to AsyncLLM by @wuxibin89 in #1949
  • [fsdp] feat: Memory efficient cross entropy with a linear layer fused by @Jianbing-D in #462
  • [recipe] feat: qwen2.5vl 7b report and guide by @ISEEKYAN in #1969
  • [ckpt] refactor: enhance FSDP checkpoint manager flexibility by @0x404 in #1350
  • [env] fix: npu ray verion to 2.46.0 for CI problem by @wyz649296016 in #1987
  • Fix TypeError by Removing Duplicate Arguments in run_deepseek671b_math_megatron.sh by @none0663 in #1996
  • [megatron] feat: Config NCCL Timeout for Megatron Backend Model Loading by @none0663 in #1983
  • [tests] chore: ppo workflow runs on volcengine machine learning platform by @htc070011 in #1979
  • [megatron] fix: multiple key error when trying to override megatron tr… by @donpromax in #1990
  • [megatron] feat: robust and efficient mcore converter with meta device init and numel check for dpsk by @Yangruipis in #1995
  • Stabilize loss calculations by clamping KL divergence values by @syo093c in #1779
  • [ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError by @lxg2015 in #2010
  • [algo] fix: vf_loss factor by @tongyx361 in #2016
  • [data] fix: fix retool sft data source by @vermouth1992 in #2018
  • [fsdp] fix: position_ids in qwen-vl by @ShareLer in #1947
  • [hardware] refactor: refactor part of device management by @FightingZhen in #1974
  • [trainer] fix: fix sft max_position_embeddings by @vermouth1992 in #2019
  • [misc] fix: fix format by @vermouth1992 in #2023
  • [megatron] fix: dpskv3 convert src and dst mixed up bug by @Yangruipis in #2029
  • fix: TensorDict usage error by @zhihe-wang in #2046
  • [hardware] feat: support qwen2_5_vl on ASCEND NPU by @as12138 in #1924
  • [trainer] chore: Reducing the number of calls to the write by @RuixiangMa in #2043
  • [Bug] fix None check in DataProto print_size() by @GHGmc2 in #2067
  • [perf] feat: Add verl profiling support from Nvidia Nsight System by @davidmlw in #1820
  • [data] fix: multimodal overlong prompt length filtering by @dirtyDan0 in #2063
  • [sglang] fix: AsyncSglangServer use async wake_up/sleep by @feifeibear in #2062
  • [training_utils] feat: Add project and experiment name to tensorboard log path by @Geaming2002 in #2080
  • [trainer] fix: Fix trainer config for val_only by @hscspring in https://github.com/volcengine/verl/pull/20842083
  • [megatron] fix: fix qwen2_vl on plain-text data and mix data of plain-text and image-text by @MaoChouHJM in #1999
  • [vllm] fix: mv disable_mm_preprocessor_cache to vllm engine_kwargs by @yyDing1 in #2068
  • [misc] feat: update instruction for running dapo on qwen2.5 7b math and add reference wandb by @vermouth1992 in #2094
  • [rollout] refactor: Add option for rollout_log_probs, and default as False by @GHGmc2 in #2072
  • [tool] feat: Add Search Tool implemented with MCP by @AlecHenx in #1948
  • [trainer] fix: make reward_extra_info optional in reward_result by @HollowMan6 in #2109
  • [algo] feat: integrate Clip-Cov and KL-Cov methods by @Raf-Chen in #1830
  • [rollout] fix: error in sgyang async mode by @chenhaiq in #2098
  • [rollout] fix: fix rollout key not found by @ETOgaosion in #2116
  • [recipe] feat: Move entropy reward to the entropy recipe by @Raf-Chen in #2118
  • [cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerProfiler to DistProfiler, add unit test based on ProfilerConfig by @eric-haibin-lin in #2117
  • [worker] feat: add support for dynamic batch size of multimodal data by @wang-zerui in #2049
  • [fsdp] refactor: set actor's strategy as default for critic and ref by @0x404 in #2130
  • [ray] feat: add a test to demonstrate how to perform p2p communication inside wor… by @vermouth1992 in #2131
  • [sglang] feat: Support async multi-turn rollout with simulation feedback in sglang by @kinza99 in #1630
  • [tool] feat: Add memory limit configuration for sandbox fusion by @plutoZZZZ in #2105
  • [sglang] feat: add multimodal input to multiturn async rollout by @nanjiangwill in #2014
  • [fsdp] feat: support fsdp2 save hugging face model by @0x404 in #2138
  • [rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True by @zyfzjsc988 in #2156
  • [rollout] feat: Support Multi-stage Awake for SGLang by @hebiao064 in #1911
  • [worker] feat: allow dist shared file-system initialization by @Cccei000 in #2154
  • [model] feat: Add MiniCPM-o 2.6 support by @RanchiZhao in #1833
  • [model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" by @hiyouga in #2176
  • [misc] fix: fix timer importance error in split_placement by @FightingZhen in #2169
  • [megatron,vllm] fix: megatron vllm async rollout server by @Yangruipis in #2122
  • [model] feat: Add MiniCPM-o 2.6 support by @hiyouga in #2178
  • [megatron] feat: Support of dist checkpoint by @ETOgaosion in #2125
  • [data] fix: fix the type of parquet_files in SFTDataset by @xuuHuang in #2203
  • [trainer] fix: add missing qwen2_moe flops counter by @ETOgaosion in #2190
  • [trainer] fix: Add init.py to verl.trainer.config by @ultmaster in #2214
  • [model] fix: make vlm patch forward compatible by @hiyouga in #2215
  • [recipe] fix: parameter order in RayPRIMETrainer super().init() call by @xxnpark in #2172
  • [misc] feat: support ValidationGenerationsLogger in vemlp_wandb by @chenhaiq in #2191

New Contributors

Thank you all for joining this project!

@vickytsang @davidjsonn @rocke2020 @vwxyzjn @Yangruipis @SeungyounShin @donpromax @leopardracer @ZhiyuLi-Nvidia @LiyuanLucasLiu @Jianbing-D @wyz649296016 @htc070011 @syo093c @FightingZhen @zhihe-wang @KaiChen1998 @wizeng23 @RuixiangMa @davidmlw @feifeibear @hscspring @MaoChouHJM @AlecHenx @wang-zerui @kinza99 @nanjiangwill @zyfzjsc988 @Cccei000 @RanchiZhao @xuuHuang @ultmaster @xxnpark @jvmncs @xingyunjohn1

Full Changelog: v0.4.0...v0.4.1