Dev rlenv 5 11 #1

histmeisah · 2025-05-26T02:50:11Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

Add one-line overview of what this PR aims to achieve or accomplish.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

Issue Number: Fixes issue # or discussion # if any.
Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if neccessary.

- Rollout back vllm version (vllm > 0.7.0 only for testing) - pyext as an extra requirement

volcengine#274) Related issue: volcengine#273 - Add `remove_previous_ckpt_in_save` and `del_local_ckpt_after_load` configuration option in `ppo_trainer.yaml` - Update `RayPPOTrainer` to support optional checkpoint deletion during loading - Modify `ActorRolloutRefWorker` and `CriticWorker` to pass checkpoint removal flag

…gine#282) Co-authored-by: zhangshulai <zhangshulai@bytedance.com>

The split placement example is outdated, I tried it and encountered some errors. To address this, the following changes were made in this PR 1. Copied the content from `verl/trainer/config/ppo_trainer.yaml` to `examples/split_placement/config/ppo_trainer_split.yaml` 2. Copied `RayPPOTrainer.fit` method into the `fit` func in `examples/split_placement/split_monkey_patch.py` and modified it to get the futures of `critic_output` and `actor_output`

…ring setup (volcengine#286)

…ne#266) ### **Enhancement: Support for `extra_info` in Reward Calculation** #### **Summary** This update enhances the reward computation process by introducing an additional `extra_info` parameter. This allows users to pass in more contextual information when calculating rewards, improving flexibility for different datasets. #### **Changes Made** - **Updated `_default_compute_score`** to accept an `extra_info` argument: ```python def _default_compute_score(data_source, solution_str, ground_truth, extra_info): ``` - **Modified the reward manager (`naive.py`)** to pass `extra_info` from `data_item.non_tensor_batch` to `compute_score`: ```python extra_info = data_item.non_tensor_batch['extra_info'] score = self.compute_score( data_source=data_source, solution_str=sequences_str, ground_truth=ground_truth, extra_info=extra_info, ) ``` #### **Why This Change?** - Some datasets require additional context beyond `data_source`, `solution_str`, and `ground_truth` for accurate reward computation. - The new `extra_info` field allows users to pass custom metadata, ideally in dictionary form, as specified in the [official documentation](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html). - This change maintains compatibility with existing dataset processing scripts, as they already include the `extra_info` field. #### **Impact** - **Improved flexibility**: Users can now pass additional contextual information, making reward computation more adaptable to different datasets. - **Backward compatibility**: Since all example datasets already include `extra_info`, this update should integrate seamlessly. Let me know if any modifications are needed!

…olcengine#284) - Fixed FSDP1 model offload - With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and `actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU memory utilization can increase to 0.9 - With actor, critic and reference offload all enabled, there will only be one model copy at a time in the GPU memory. Therefore, we can further increase the `micro_batch_size_per_gpu` or `max_token_per_gpu` **Specifically:** - During rollout, only rollout model and KVCache are in the GPU memory. - During critic compute values, only the critic model will stay in the GPU memory while its optimizer and other model states are in CPU main memory - During actor update, the actor model, optimizer are stored on GPU while the reference model and critic model, critic optimizer are offloaded to CPU.

1 fix wrong notes description. 2 fix wrong code path. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>

Avoid CPU-to-device loading or offloading when the optimizer is not initialized to prevent the incorrect creation of the optimizer.state

We need to specify the minimum permission in the workflow.

A working Slurm example adapted from https://docs.ray.io/en/latest/ray-core/starting-ray.html

)

Support Qwen2 Megatron backend The code is primarily adapted from the llama folder, with modifications to use QKV bias and remove the rope_scaling of RoPE in `verl/models/qwen2/megatron/layers/parallel_attention.py`. - Train using Qwen2-7B-Instruct with PPO, GSM8k score can reach 0.87 at step 75. - not support saver now

…#327)

…olcengine#318) This PR adds Ray Serve to the requirements to enable support for multi-node training. It addresses the issue described here: volcengine#87 (comment) Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com>

Implement RLOO algorithm according to https://arxiv.org/abs/2402.14740

…cengine#343)

Tracking backend support vemlp wandb --------- Co-authored-by: liudayuan.carrot <liudayuan.carrot@bytedance.com>

Validation datasets are sent to inference engines as a whole batch, which will schedule the memory themselves. - [x] Remove `val_batch_size` from examples - [x] Set default values of `val_batch_size` in configs as `null` and add DEPRECATED comments - [x] Add deprecation warnings about `val_batch_size` in `_validate_config`

fix the issue[volcengine#331](volcengine#331)

….48 (volcengine#357) close volcengine#312 Add support for ulysses sp for transformers >= 0.48 I've tested transformers 0.45.0, 0.46.0, 0.47.0, 0.48.0 and 0.49.0, using sp=2 with the following script in my local env ```bash #!/bin/bash set -ex VERSIONS=("4.45.0" "4.46.0" "4.47.0" "4.48.0" "4.49.0") for version in "${VERSIONS[@]}"; do echo "Testing with Transformers version ${version}" echo "----------------------------------------" pip install "transformers==${version}" PYTHONPATH=./ torchrun --nproc_per_node=2 tests/model/test_transformers_ulysses.py echo "----------------------------------------" echo "Completed testing for version ${version}" echo "" done ```

…engine#944) ### Changes Add gradient checkpointing (aka `activation recomputation`) config and support from Megatron core (https://github.com/NVIDIA/Megatron-LM/blob/b7ec711cf66cf500b98d8783f2c7f3c3a7d5ba31/megatron/core/transformer/transformer_config.py#L208-L233) to make activation checkpointing more efficient for LLMs with 20B+ parameters. ``` gradient_checkpointing_kwargs: activations_checkpoint_method: null activations_checkpoint_granularity: null activations_checkpoint_num_layers: null ``` ### Test Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM issues after adding gradient checkpointing. ### Next Step Add one `ppo_trainer for megatron` doc to explain the config details in https://verl.readthedocs.io/en/latest/examples/config.html

…volcengine#925) As mentioned in volcengine#903, the model_merger script has some problem when dealing with saved fsdp checkpoint trained with `trainer.n_gpus_per_node=1`. The loaded `weight` is of type `Tensor` instead of `DTensor`. This PR supported this situation.

… time configurable & providing better error info (volcengine#947) ## Summary As mentioned in volcengine#491, the `register_center` named actor could be `None` after 2mins waiting time and crash the job for some verl users. This might be due to (1) uncleaned ray resources from previous runs; or (2) too short waiting time of 120s if the `named_actor` launching task is delayed in the cluster. This PR makes the `register_cetner` named actor waiting time configurable and longer by default . This PR also provides better error info to help users to self debug the issue. ## Related issues volcengine#491 --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…engine#958) users using fsdp backend may no have megatron installed, directly running this script will lead to an import error.

…ne#959) Support to set warmup_style=='cosine'.

Fixing volcengine#950

…gine#976) Co-authored-by: HL <linhaibin.eric@gmail.com>

Currently, `pg_clipfrac_lower` is always 0 by mistake.

Currently the model merger does not support HSDP (the `ddp` mesh dim is not considered). This PR fixes this.

See vllm-project/vllm@8b66470 In summary, now when using external launcher in vLLM, a Seed must be set. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

This PR suggest a fix on a bug that when `_switch_chat_template()` method is called. According to https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L222 `data.non_tensor_batch['raw_prompt'][i]` is already a list if `data.return_raw_chat=True`. Calling `.tolist()` again will result an error. Now we check if it is a list before run this method.

* [release] update verl doc url and update version and setup * fix init with only fsdp and update setup for pypi * update version

…engine#2365) ### What does this PR do? Fix a regression from volcengine#1911, because the PR did not change the sglang async branch. CI did not catch this error because it only run 1 step, but this error happen in the second test. So I update the testcases to run 2 steps. To reproduce the bug, run test: TOTAL_TRAIN_STEPS=2 ENGINE=sglang ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh It fail with: ``` (WorkerDict pid=1257286) Total steps: 2, num_warmup_steps: 0 (WorkerDict pid=1257286) Actor use_remove_padding=True (WorkerDict pid=1257286) Actor use_fused_kernels=False (AsyncSglangServer pid=1260392) FastAPI listen on [192.168.111.48:40451](http://192.168.111.48:40451/) (WorkerDict pid=1257286) terminate called after throwing an instance of 'c10::Error' (WorkerDict pid=1257286) what(): CUDA error: an illegal memory access was encountered (WorkerDict pid=1257286) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (WorkerDict pid=1257286) For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (WorkerDict pid=1257286) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. (WorkerDict pid=1257286) (WorkerDict pid=1257286) Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first): (WorkerDict pid=1257286) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbf6036c1b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/)) (WorkerDict pid=1257286) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fbf60315a76 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/)) (WorkerDict pid=1257286) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fbf6080d918 in ``` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20an%20illegal%20memory%20access%20was%20encountered - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ``` (TaskRunner pid=1647269) step:2 - global_seqlen/min:13075 - global_seqlen/max:14837 - global_seqlen/minmax_diff:1762 - global_seqlen/balanced_min:14231 - global_seqlen/balanced_max:14232 - global_seqlen/mean:14231.5 - actor/entropy:2.0606913566589355 - critic/vf_loss:8.7157882153 ``` ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

PeterSH6 and others added 30 commits February 15, 2025 10:50

[misc] fix install requirement (volcengine#279)

f5fcb69

- Rollout back vllm version (vllm > 0.7.0 only for testing) - pyext as an extra requirement

[doc] Give an additional instruction in building nightly vLLM (volcen…

f8662b2

…gine#282) Co-authored-by: zhangshulai <zhangshulai@bytedance.com>

release: bump up version to v0.2

9505fed

distro: make liger-kernel optional. do not rely on requirement.txt du…

ffd4dc4

…ring setup (volcengine#286)

Fix wrongs args desc . (volcengine#294)

d089723

1 fix wrong notes description. 2 fix wrong code path. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>

example: fix the gemma2 example, update NGC dockerfile (volcengine#291)

22035e4

docs: update news

0ce337a

fix: fix offload/load optimizer impl (volcengine#299)

028cd4e

Avoid CPU-to-device loading or offloading when the optimizer is not initialized to prevent the incorrect creation of the optimizer.state

Added content permissions of the workflow (volcengine#303)

f2cc02f

We need to specify the minimum permission in the workflow.

Added the dependabot action (volcengine#304)

73fd1f4

docs: add recent event and blogs (volcengine#305)

13f4e9d

docs: add an example for Ray on Slurm (volcengine#309)

a531ad2

A working Slurm example adapted from https://docs.ray.io/en/latest/ray-core/starting-ray.html

fix: specify the hash version of action in scorecard.yml (volcengine#313

57e0df5

)

fix vllm 0.7 documentation link in readme (volcengine#317)

8c0aa84

distro: bump up version to v0.2.0.dev, limit vllm version (volcengine…

bd99712

…#327)

docs: add faq for vllm illegal memory access (volcengine#333)

f55a012

algo: Rloo advantage estimator (volcengine#341)

af29fce

Implement RLOO algorithm according to https://arxiv.org/abs/2402.14740

docs: add links for rloo and volcengine distributed training doc (vol…

ce63b2f

…cengine#343)

chore: update optimizer_config.py (volcengine#348)

86f408d

feat: tracking support vemlp (volcengine#339)

7b51996

Tracking backend support vemlp wandb --------- Co-authored-by: liudayuan.carrot <liudayuan.carrot@bytedance.com>

[fix] Improve the params template for generation (volcengine#351)

9e8ee41

fix the issue[volcengine#331](volcengine#331)

[docs] modify the comments (volcengine#363)

070904b

mertunsall and others added 27 commits April 7, 2025 11:15

feat: Batch Rewards (volcengine#871)

cb1a7cb

[merger] fix: move megatron import into megatron related branch (volc…

78a3d5a

…engine#958) users using fsdp backend may no have megatron installed, directly running this script will lead to an import error.

fix: optim.warmup_style do not take effect (volcengine#418) (volcengi…

7a77668

…ne#959) Support to set warmup_style=='cosine'.

docs: add open-hands, vagen (volcengine#963)

8691208

fix: return list from bootstrap_metric (volcengine#969)

74d7253

Fixing volcengine#950

fix: DAPO wandb link (volcengine#978)

c158187

fix: reward_fn_key for PRIME (volcengine#975)

a2c390b

[tuning] docs: record the resource requirements for 70b model (volcen…

8989723

…gine#976) Co-authored-by: HL <linhaibin.eric@gmail.com>

fix: wrong pg_clipfrac_lower (volcengine#972)

32c85bc

Currently, `pg_clipfrac_lower` is always 0 by mistake.

Add support to HSDP model merging. (volcengine#971)

cc79572

Currently the model merger does not support HSDP (the `ddp` mesh dim is not considered). This PR fixes this.

fix: add seed to vllm spmd 0.8.3 (volcengine#912)

676c43a

See vllm-project/vllm@8b66470 In summary, now when using external launcher in vLLM, a Seed must be set. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>

docs: update recent talks (volcengine#996)

9e8bc7b

support 0.6.3 search vllm rollout

b58361b

add tool client

6879de7

merge tool using

d1b5591

Remove rl_dataset directory from version control and add to gitignore

36f81de

Initial commit without large files

39197ed

Merge branch 'recover-branch' into dev_4_27

9d7f5c5

debug tool rllout

1f329bd

Merge verl/main into dev_5_03 with custom strategy

c1afd7b

update async vllm

11da137

add reward log

4db898d

modify tool vllm 0.8.3

be61055

histmeisah merged commit 0e9e32c into main May 26, 2025
3 of 30 checks passed

histmeisah pushed a commit that referenced this pull request May 26, 2025

[release] feat: first release version on pypi v0.1.1 (#1)

b5e5c8c

* [release] update verl doc url and update version and setup * fix init with only fsdp and update setup for pypi * update version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev rlenv 5 11 #1

Dev rlenv 5 11 #1

Uh oh!

histmeisah commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

Dev rlenv 5 11 #1

Dev rlenv 5 11 #1

Uh oh!

Conversation

histmeisah commented May 26, 2025

Checklist Before Starting

What does this PR do?

High-Level Design

Specific Changes

API

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

Uh oh!

Uh oh!