OOM when trying to reproduce the grpo-deepscaler run

**Describe the bug**

When attempting to run the second stage of GRPO on DeepScaler with
`max_total_sequence_length` set to 16384, I encounter out-of-memory (OOM) errors related to VRAM. The only way to successfully run the second stage is by drastically reducing the GRPO batch size to 2 and adjusting other configurations, which significantly increases training time (estimated 20x or more).

**Steps/Code to reproduce bug**

- Use the standard DeepScaler GRPO configuration. After the fix from my previous PR #442 and now #455 thanks to @terrykong 
- Attempt to run the second stage of the GRPO training with policy.max_total_sequence_length set to 16384.
- no other changes 


**Expected behavior**

reproducing similar training as described in [GRPO on DeepScaler](https://github.com/NVIDIA/NeMo-RL/blob/main/docs/guides/grpo-deepscaler.md)

**Environment overview and details**

 - Environment location: Determined AI environment
Linux f300d81ccc7f 4.18.0-513.5.1.el8_9.x86_64 <span>#</span>1 SMP Fri Sep 29 05:21:10 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux 
Ubuntu 22.04.4 LTS <br/>
NVIDIA-SMI 555.42.06 
Driver Version: 555.42.06 
CUDA Version: 12.5 <br/>
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

- Python 3.12.10
- Method of install: same as in [Prerequisites](https://github.com/LeonMalteW/NeMo-RL/tree/main/README.md)


**Additional context**
8x NVIDIA A100-SXM4-80GB (same as described in [GRPO on DeepScaler](https://github.com/NVIDIA/NeMo-RL/blob/main/docs/guides/grpo-deepscaler.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM when trying to reproduce the grpo-deepscaler run #456

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM when trying to reproduce the grpo-deepscaler run #456

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions