GSPO parameters update from v2 #3798

BounharAbdelaziz · 2025-07-29T11:17:00Z

What does this PR do?

This PR updates thetraining_args of GSPO in docs/source/paper_index.md to align with the hyperparameters described in the GSPO v2 paper and documented in trl/trainer/grpo_config.py.

Before submitting

[ x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x ] Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@qgallouedec

qgallouedec · 2025-07-30T02:54:45Z

docs/source/paper_index.md

+    beta=0.0,  # GSPO set kl regularization to zero: https://github.com/volcengine/verl/pull/2775#issuecomment-3131807306 
+    epsilon=3e-4,  # GSPO paper (v2), section 5.1
+    epsilon_high=4e-4,  # GSPO paper (v2), section 5.1
+    gradient_accumulation_steps=4,  # partition rollout batch into 4 mini-batches. GSPO paper (v2), section 5.1


Suggested change

gradient_accumulation_steps=4, # partition rollout batch into 4 mini-batches. GSPO paper (v2), section 5.1

steps_per_generation=4, # partition rollout batch into 4 mini-batches. GSPO paper (v2), section 5.1

can you also add somewhere that this value steps_per_generation should be 4x gradient_accumulation_steps

you are right! done. thanks.

qgallouedec · 2025-07-30T02:55:18Z

Oops, I missed yours, closing #3802 thanks!

HuggingFaceDocBuilderDev · 2025-07-30T11:21:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LeonEricsson

LGTM

docs/source/paper_index.md

GSPO parameters update from v2

1ff6d92

qgallouedec reviewed Jul 30, 2025

View reviewed changes

BounharAbdelaziz and others added 2 commits July 30, 2025 09:52

added steps_per_generation

ac692d8

Merge branch 'main' into gspo

559bef9

Merge branch 'main' into gspo

5fc455b

LeonEricsson approved these changes Jul 30, 2025

View reviewed changes

qgallouedec reviewed Jul 30, 2025

View reviewed changes

docs/source/paper_index.md Outdated Show resolved Hide resolved

qgallouedec added 2 commits July 30, 2025 20:10

Merge branch 'main' into gspo

3b2ea37

Update docs/source/paper_index.md

c2d559f

qgallouedec merged commit 79c5797 into huggingface:main Jul 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GSPO parameters update from v2 #3798

GSPO parameters update from v2 #3798

Uh oh!

BounharAbdelaziz commented Jul 29, 2025

Uh oh!

qgallouedec Jul 30, 2025

Uh oh!

qgallouedec Jul 30, 2025

Uh oh!

BounharAbdelaziz Jul 30, 2025

Uh oh!

qgallouedec commented Jul 30, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2025

Uh oh!

LeonEricsson left a comment

Uh oh!

Uh oh!

Uh oh!

	gradient_accumulation_steps=4, # partition rollout batch into 4 mini-batches. GSPO paper (v2), section 5.1
	steps_per_generation=4, # partition rollout batch into 4 mini-batches. GSPO paper (v2), section 5.1

GSPO parameters update from v2 #3798

GSPO parameters update from v2 #3798

Uh oh!

Conversation

BounharAbdelaziz commented Jul 29, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

BounharAbdelaziz Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Jul 30, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2025

Uh oh!

LeonEricsson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!