-
Notifications
You must be signed in to change notification settings - Fork 122
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Error:
Same behavior as #564
raise RuntimeError(f"NCCL error: {error_str}")
RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details)
Repro:
commit 3f6d52f
uv run python examples/run_grpo_math.py \
policy.generation.colocated.enabled=false \
policy.generation.colocated.resources.gpus_per_node=2 \
policy.generation.vllm_cfg.tensor_parallel_size=2 \
checkpointing.enabled=false \
cluster.gpus_per_node=4
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working