Skip to content

Conversation

SwordFaith
Copy link
Collaborator

Checklist Before Starting

  • Search for similar PR(s).

What does this PR do?

Add one-line overview of what this PR aims to achieve or accomplish.

Resolved the tool formatting issue: Previously, arguments were stored as strings, causing iterative addition of \\ due to multiple calls to json.dumps.

Fixed the response_position_ids mismatch between generate_sequences and generate_sequences_with_tools: In the earlier implementation, generate_sequences_with_tools used zero padding for positions where attention mask == 0, which resulted in NaN values during the training phase.

Specific Changes

List the specific changes.

  • Introduced a new schema, OpenAIFunctionCallSchema, to store converted tool calls.
  • Updated the AsyncSGLangRollout tool to skip non-dict type arguments instead of handling any string at the arguments position.
  • Aligned response_position_ids in generate_sequences_with_tools with the behavior of generate_sequences.
  • Enhanced tool descriptions to prevent misleading parse errors, as returning 0.0 caused the model to incorrectly modify answers.

API

Demonstrate how the API changes if any.

  • Revise the execute interface of the tool to directly accept dict[str, Any] instead of a JSON string.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this 

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

  • Issue Number: Fixes issue # or discussion # if any.
  • Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
  • Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title if it breaks any API.
  • Update the documentation about your changes in the docs.
  • Add CI test(s) if neccessary.

@SwordFaith
Copy link
Collaborator Author

new wandb log: https://wandb.ai/swordfaith/gsm8k_async_rl/runs/ta7jhvgq?nw=nwuserswordfaith

seems nan issue is fixed.

@vermouth1992
Copy link
Collaborator

Shall we merge this @zhaochenyang20

Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job

@zhaochenyang20 zhaochenyang20 merged commit bc9062d into volcengine:main May 11, 2025
28 checks passed
@supermancmk
Copy link

Thank you very much, but it still doesn't seem to work and doesn't solve the problem of the model training crash.
I pulled the latest version of verl's code without any modifications and ran the official gsm8k with tool example and the model still training crashes and appears Nan.
Any solutions please. Here is my wandb log
@SwordFaith @vermouth1992 @zhaochenyang20

非常感谢,但好像还是不起作用,没有解决模型训练崩溃的问题。我拉取了verl最新版本的代码,没有进行任何修改,跑官方gsm8k with tool的例子,模型仍然会训崩,请问有什么解决办法。下面是我wandb日志

image
image
image
image
image

@SwordFaith SwordFaith deleted the fix/sglang_async_multiturn_nan branch May 19, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants