-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Describe the bug
When attempting to run the second stage of GRPO on DeepScaler with
max_total_sequence_length
set to 16384, I encounter out-of-memory (OOM) errors related to VRAM. The only way to successfully run the second stage is by drastically reducing the GRPO batch size to 2 and adjusting other configurations, which significantly increases training time (estimated 20x or more).
Steps/Code to reproduce bug
- Use the standard DeepScaler GRPO configuration. After the fix from my previous PR add missing entry dynamic_batching and setting it to False #442 and now fix: add missing entry dynamic_batching and setting it to False #455 thanks to @terrykong
- Attempt to run the second stage of the GRPO training with policy.max_total_sequence_length set to 16384.
- no other changes
Expected behavior
reproducing similar training as described in GRPO on DeepScaler
Environment overview and details
-
Environment location: Determined AI environment
Linux f300d81ccc7f 4.18.0-513.5.1.el8_9.x86_64 #1 SMP Fri Sep 29 05:21:10 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 22.04.4 LTS
NVIDIA-SMI 555.42.06
Driver Version: 555.42.06
CUDA Version: 12.5
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0 -
Python 3.12.10
-
Method of install: same as in Prerequisites
Additional context
8x NVIDIA A100-SXM4-80GB (same as described in GRPO on DeepScaler