Skip to content

CUDA OOM when running deepscaler tutorial on 1 A6000 #493

@okuchaiev

Description

@okuchaiev

Describe the bug

I was running deepscaler tutorial https://github.com/NVIDIA/NeMo-RL/blob/main/docs/guides/grpo-deepscaler.md on a single A6000 with 48GB

Got this after some time

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU 0 has a total capacity of 47.41 GiB of which 370.38 MiB is free. Process 21440 has 366.00 MiB memory in use. Including non-PyTorch memory, this process has 46.64 GiB memory in use. Of the allocated memory 45.80 GiB is allocated by PyTorch, and 215.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Steps/Code to reproduce bug

uv run examples/run_grpo_math.py --config=examples/configs/grpo-deepscaler-1.5b-8K.yaml

Expected behavior

Not OOM.

If this does not fit A6000, please document the minimal system requirement where this tutorial should work.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
  • Method of install: [pip install or from source]. Please specify exact commands you used to install.
  • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

  • OS version: Ubuntu 22.04
    Running using uv

Additional context

GPU model: NVIDIA RTX A6000 with 48GB

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions