Skip to content

Conversation

ccclyu
Copy link
Collaborator

@ccclyu ccclyu commented Apr 6, 2025

Changes

Add gradient checkpointing (aka activation recomputation) config and support from Megatron core (https://github.com/NVIDIA/Megatron-LM/blob/b7ec711cf66cf500b98d8783f2c7f3c3a7d5ba31/megatron/core/transformer/transformer_config.py#L208-L233) to make activation checkpointing more efficient for LLMs with 20B+ parameters.

 gradient_checkpointing_kwargs:
     activations_checkpoint_method: null
     activations_checkpoint_granularity: null
     activations_checkpoint_num_layers: null

Test

Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM issues after adding gradient checkpointing.

Next Step

Add one ppo_trainer for megatron doc to explain the config details in https://verl.readthedocs.io/en/latest/examples/config.html

@eric-haibin-lin eric-haibin-lin merged commit d13434f into volcengine:main Apr 7, 2025
21 checks passed
yuchenwang3 pushed a commit to yuchenwang3/verl that referenced this pull request Apr 25, 2025
…engine#944)

### Changes 

Add gradient checkpointing (aka `activation recomputation`) config and
support from Megatron core
(https://github.com/NVIDIA/Megatron-LM/blob/b7ec711cf66cf500b98d8783f2c7f3c3a7d5ba31/megatron/core/transformer/transformer_config.py#L208-L233)
to make activation checkpointing more efficient for LLMs with 20B+
parameters.

```
 gradient_checkpointing_kwargs:
     activations_checkpoint_method: null
     activations_checkpoint_granularity: null
     activations_checkpoint_num_layers: null
```

### Test 
Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM
issues after adding gradient checkpointing.

### Next Step

Add one `ppo_trainer for megatron` doc to explain the config details in
https://verl.readthedocs.io/en/latest/examples/config.html
histmeisah pushed a commit to SJTU-IAAR/verl that referenced this pull request Apr 27, 2025
…engine#944)

### Changes 

Add gradient checkpointing (aka `activation recomputation`) config and
support from Megatron core
(https://github.com/NVIDIA/Megatron-LM/blob/b7ec711cf66cf500b98d8783f2c7f3c3a7d5ba31/megatron/core/transformer/transformer_config.py#L208-L233)
to make activation checkpointing more efficient for LLMs with 20B+
parameters.

```
 gradient_checkpointing_kwargs:
     activations_checkpoint_method: null
     activations_checkpoint_granularity: null
     activations_checkpoint_num_layers: null
```

### Test 
Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM
issues after adding gradient checkpointing.

### Next Step

Add one `ppo_trainer for megatron` doc to explain the config details in
https://verl.readthedocs.io/en/latest/examples/config.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants