feat: add gradient checkpointing in megatron backend #944

ccclyu · 2025-04-06T21:31:23Z

Changes

Add gradient checkpointing (aka activation recomputation) config and support from Megatron core (https://github.com/NVIDIA/Megatron-LM/blob/b7ec711cf66cf500b98d8783f2c7f3c3a7d5ba31/megatron/core/transformer/transformer_config.py#L208-L233) to make activation checkpointing more efficient for LLMs with 20B+ parameters.

 gradient_checkpointing_kwargs:
     activations_checkpoint_method: null
     activations_checkpoint_granularity: null
     activations_checkpoint_num_layers: null

Test

Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM issues after adding gradient checkpointing.

Next Step

Add one ppo_trainer for megatron doc to explain the config details in https://verl.readthedocs.io/en/latest/examples/config.html

…engine#944) ### Changes Add gradient checkpointing (aka `activation recomputation`) config and support from Megatron core (https://github.com/NVIDIA/Megatron-LM/blob/b7ec711cf66cf500b98d8783f2c7f3c3a7d5ba31/megatron/core/transformer/transformer_config.py#L208-L233) to make activation checkpointing more efficient for LLMs with 20B+ parameters. ``` gradient_checkpointing_kwargs: activations_checkpoint_method: null activations_checkpoint_granularity: null activations_checkpoint_num_layers: null ``` ### Test Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM issues after adding gradient checkpointing. ### Next Step Add one `ppo_trainer for megatron` doc to explain the config details in https://verl.readthedocs.io/en/latest/examples/config.html

feat: add gradient checkpointing in megatron backend

5b542d2

eric-haibin-lin approved these changes Apr 6, 2025

View reviewed changes

eric-haibin-lin merged commit d13434f into volcengine:main Apr 7, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add gradient checkpointing in megatron backend #944

feat: add gradient checkpointing in megatron backend #944

Uh oh!

ccclyu commented Apr 6, 2025

Uh oh!

Uh oh!

Uh oh!

feat: add gradient checkpointing in megatron backend #944

feat: add gradient checkpointing in megatron backend #944

Uh oh!

Conversation

ccclyu commented Apr 6, 2025

Changes

Test

Next Step

Uh oh!

Uh oh!

Uh oh!