Feat/memory optimized loss #1212

mingruimingrui · 2025-04-23T04:36:31Z

What does this PR do?

This PR implements fused losses for alignment. #710
It reduces the memory required for loss calculation to a small constant amount.

ChangeLog:

added the option use_fused_kernels
monkey patch to make model.forward return last_hidden_state and not calculate logits
Added FusedLinearForPPO to verl/utils/experimental/torch_functional.py

Usage

Simply add the following option

actor_rollout_ref.model.use_fused_kernels=True

Before submitting

Did you read the Contribute Guide and finish the code format check?
Did you make sure to update the documentations with your changes in the docs especially for breaking config etc?
Did you write any test cases if neccessary? Please add CI tests to your new feature.

Additional Info:

The current implementation uses chunking to reduce the memory consumption to a constant value.
- It works by splitting the loss calculations into chunks of 512 tokens. Calculating the log_probs / entropy values / gradients for each chunk and accumulating them.
- However the current implementation can be slow. It processes each chunk sequentially in a python for loop.
- In the future we should consider converting the fused functions into triton or some other JIT solution.
Compared to FusedPPOLossFunction, optimizing hidden_states -> entropy & log_probs is much better for algorithm developers as the memory heavy part is optimized away for them and they are free to combine the values for their own custom loss functions.

ETOgaosion · 2025-05-15T04:31:15Z

@mingruimingrui Could you also add the e2e integrated test? Simply enable actor_rollout_ref.model.use_fused_kernels=True in tests/e2e/ppo_trainer/run_function_reward.sh

And we should set it default to True(use_fused_kernels: True) to reduce memory in verl/trainer/config/ppo_trainer.yaml

This reverts commit eb077f6.

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Currently, the `e2e_prime` test encounters the error` AttributeError: 'NoneType' object has no attribute 'squeeze'`, which is caused by [ #1212]. In PR [#1568], the parameter `use_fused_kernel` in `ppo_trainer.yaml` was set to `false`, but the corresponding parameter in `prime_trainer.yaml` was not updated. This is preventing the CI from passing. Before the root cause of `use_fused_kernel` is fully resolved , I guess we should temporarily set `use_fused_kernel` to `false` in `prime_trainer.yaml` ### High-Level Design Not needed ### Specific Changes - Default use_fused_kernels = False ### API Not needed ### Usage Example Not needed ### Test Not needed ### Additional Info. Not needed ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.

mingruimingrui added 29 commits April 20, 2025 11:14

added fused losses

d46446c

added temperature

de6b530

Updated dp_actor to use_materialized_logits

8397a63

added patched forward for qwen2_vl

ad0ef17

change name to use_non_materialized_logits

5c877db

added forward_without_logits functions

9c9b859

added note

8964747

added return_last_hidden_states to monkey patch function

871d12e

fixed naming

fe7a2e8

default use_remove_padding set to true

9b3d625

pass in return_last_hidden_state

b9eee81

add option use_fused_loss

521d6e7

added fused loss

f369566

format

3958f3d

Merge branch 'main' into feat/memory-optimized-loss

7c3e17a

do check in ray_trainer

4f39677

remove uneeded variable

687fed2

fixed adding attributes to ref config

b875e41

remove unused import

63ed642

added temperature to entropy

b7f9b89

added temperature

371b6b9

patched entropy temperature

b187c4d

fixed entropy bwd

44aa3b3

added temp to backward

9c18ae0

use grad_output instead of seq len

42cc158

improve commenting

0e12f7e

added doc string for bwd

5d76948

remove checkpoint

df3c009

added custom backward function

e1b6ab8

hiyouga self-requested a review April 23, 2025 07:21

mingruimingrui added 4 commits May 12, 2025 23:46

use torch.compile

2acbd43

revert to fp32 hidden states and vocab

108d787

remove unused import

ded1290

rename variable

c3427ff

vadimkantorov mentioned this pull request May 12, 2025

Fused Linear and Cross-Entropy Loss torch.nn.functional.linear_cross_entropy pytorch/pytorch#124480

Open

mingruimingrui added 8 commits May 14, 2025 21:34

Change to using fused linear ppo

cc67d6c

remove deprecated functions

c85891f

account for None gradients

0a1b97a

fixed for rmpad

b63aca5

check if vocab or hidden states require gradients

2f5a08d

add dynamic

842e2c3

disable compile for fused kernels for now

91ca237

set output to require grad

57c5f95

mingruimingrui added 8 commits May 15, 2025 13:33

default use fused kernels

884a509

fixed typo

3063552

ensure int64

c7ec6a1

update prime reward model

753cac3

patch reward model configs

21c8839

Patched prime ref model

bf376b8

fixed bug for hidden_states doesnt require grad

e8a66f8

cast log probs in prime rm to fp32

d3a8a55

ETOgaosion approved these changes May 16, 2025

View reviewed changes

ETOgaosion merged commit eb077f6 into volcengine:main May 16, 2025
37 of 38 checks passed

mingruimingrui deleted the feat/memory-optimized-loss branch May 18, 2025 09:43

eric-haibin-lin mentioned this pull request May 19, 2025

[roadmap] verl development Q2 #710

Closed

33 tasks

feifeibear added a commit to feifeibear/verl that referenced this pull request May 20, 2025

Revert "Feat/memory optimized loss (volcengine#1212)"

6236648

This reverts commit eb077f6.

plutoZZZZ mentioned this pull request May 20, 2025

Disable fused kernels in prime #1598

Merged

6 tasks

vadimkantorov mentioned this pull request Jul 8, 2025

Fused (or at least chunked) Linear + GRPO loss #2422

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/memory optimized loss #1212

Feat/memory optimized loss #1212

Uh oh!

mingruimingrui commented Apr 23, 2025 •

edited

Loading

Uh oh!

ETOgaosion commented May 15, 2025

Uh oh!

Uh oh!

Uh oh!

Feat/memory optimized loss #1212

Feat/memory optimized loss #1212

Uh oh!

Conversation

mingruimingrui commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

ChangeLog:

Usage

Before submitting

Additional Info:

Uh oh!

ETOgaosion commented May 15, 2025

Uh oh!

Uh oh!

Uh oh!

mingruimingrui commented Apr 23, 2025 •

edited

Loading