-
Notifications
You must be signed in to change notification settings - Fork 120
Closed
Labels
Description
Describe the bug
I was running deepscaler tutorial https://github.com/NVIDIA/NeMo-RL/blob/main/docs/guides/grpo-deepscaler.md on a single A6000 with 48GB
Got this after some time
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU 0 has a total capacity of 47.41 GiB of which 370.38 MiB is free. Process 21440 has 366.00 MiB memory in use. Including non-PyTorch memory, this process has 46.64 GiB memory in use. Of the allocated memory 45.80 GiB is allocated by PyTorch, and 215.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Steps/Code to reproduce bug
uv run examples/run_grpo_math.py --config=examples/configs/grpo-deepscaler-1.5b-8K.yaml
Expected behavior
Not OOM.
If this does not fit A6000, please document the minimal system requirement where this tutorial should work.
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
- Method of install: [pip install or from source]. Please specify exact commands you used to install.
- If method of install is [Docker], provide
docker pull
&docker run
commands used
Environment details
- OS version: Ubuntu 22.04
Running using uv
Additional context
GPU model: NVIDIA RTX A6000 with 48GB