Skip to content

Conversation

Lins-01
Copy link
Contributor

@Lins-01 Lins-01 commented May 25, 2025

[Feat]: Search Tool Invocation in Multi-Turn RL Training

Checklist Before

  • Search for similar PR(s).

What does this PR do?

  • As veRL users, we want the model to invoke designated tools during the Actor rollout phase and seamlessly integrate their outputs into the training pipeline.

  • We have added search-tool invocation capability to veRL-sglang MultiTurnRL, enabling the model to issue retrieval requests during Actor rollout and directly leverage the returned results for training.

  • providing the community with a reimplementation similar to searchR1.

  • Training curves on Wandb: search_async_rl

  • Third-party training reproducibility verification has been successfully completed.

  • Thanks to the SGlang team and the author of searchR1 for their efficient support!

Project Member:

  • Ling Chang (Author)
  • Bowen Jin (Advisor on Training)
  • Xiaocheng Wang (Advisor on Implementation)
  • Nan Jiang (Reproduce)
  • Chenyang Zhao (PM)
  • Xiang Long (Reviewer, PM)

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title if it breaks any API.
  • Update the documentation about your changes in the docs.
  • Add CI test(s) if necessary.

How to Use

Refer to verl-multiturn-searchR1-like.md or verl-multiturn-searchR1-like_ZH.md in the Awesome-ML-SYS-Tutorial repository.

@CLAassistant
Copy link

CLAassistant commented May 25, 2025

CLA assistant check
All committers have signed the CLA.

@Lins-01 Lins-01 changed the title feat: search tool for multi-turn [feat]: search tool for multi-turn May 26, 2025
@Lins-01 Lins-01 changed the title [feat]: search tool for multi-turn [sglang] feat: search tool for multi-turn May 26, 2025
@Lins-01 Lins-01 changed the title [sglang] feat: search tool for multi-turn [sglang] Feat: Search Tool Invocation in Multi-Turn RL Training May 26, 2025
@@ -0,0 +1,352 @@
import argparse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use snake_case (search_r1_like)

Copy link
Collaborator

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the great work. what is the reproduced model performance?

@zhaochenyang20
Copy link
Collaborator

tool_reward_score: The step reward score of the tool.
tool_metrics: The metrics of the tool.
"""
timeout = parameters.get("timeout", self.default_timeout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is timeout considered a tool parameter instead of being part of the tool configuration?

Copy link
Contributor Author

@Lins-01 Lins-01 May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've made the change.

@@ -0,0 +1,214 @@
# Copyright 2024 Bytedance Ltd. and/or its affiliates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section of utils appears to be more aligned with a search tool's functionality rather than being tailored for reward scoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relocated to verl/tools/utils/search_r1_like_utils.py.

}
return instance_id

def execute_search(self, instance_id: str, query_list: list, retrieval_service_url: str, topk: int, timeout: int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use perform_single_search_batch directly here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrapped it for readability and future extensibility, but happy to inline if you prefer.

Copy link
Collaborator

@SwordFaith SwordFaith May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if a user wants to extend this tool for their own search purposes. Should they subclass and override the execute_search method, simply configure a URL, or are there other possible approaches you can share? After discussion, we added doc to how to extend url & modify logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the retriever tool modification notes in the README link above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is an empty file introduced here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t modify this file myself — it was likely introduced during the git merge main.
Looking at it now, the content seems to be added by others and looks valid, so I kept it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to move the prompt.yaml file to data_preprocess/config/prompt.yaml? If we add prompt.yaml, can we include it directly in the data preprocess for searching r1 data? Why is having a separate config file a better approach?

@@ -0,0 +1,22 @@
tools:
- class_name: "verl.tools.search_tool.SearchTool"
config: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly using the YAML format would be more straightforward.

}
return instance_id

def execute_search(self, instance_id: str, query_list: list, retrieval_service_url: str, topk: int, timeout: int):
Copy link
Collaborator

@SwordFaith SwordFaith May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if a user wants to extend this tool for their own search purposes. Should they subclass and override the execute_search method, simply configure a URL, or are there other possible approaches you can share? After discussion, we added doc to how to extend url & modify logic.

@SwordFaith
Copy link
Collaborator

LGTM

@Lins-01
Copy link
Contributor Author

Lins-01 commented May 29, 2025

@Lins-01 Thanks for your contributions for veRL! I noticed some of the code appears to be referenced from the following projects:

fufankeji/fufan-chat-api encoder.py RUC-NLPIR/FlashRAG utils.py

Could you please to check the licenses of the referenced code to avoid any potential legal issues?

Some code may have been duplicated with an existing PR.

https://github.com/volcengine/verl/pull/1525/files#

Also, a unit test for the search tooling functionality is welcome.

Appreciate the reminder and encouragement! License attributions have been added — will follow up with the unit test soon.

Copy link
Collaborator

@feifeibear feifeibear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

@zhaochenyang20
Copy link
Collaborator

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

will add it with mock search api.

@Lins-01
Copy link
Contributor Author

Lins-01 commented May 30, 2025

@feifeibear

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

We've added the unit tests for the search tool as recommended.

@@ -0,0 +1,522 @@
# Copyright 2025 Bytedance Ltd. and/or its affiliates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file has many duplicated code with the test_sglang_async_rollout_sf_tools.py.

Most of the testing appears redundant, as it's essentially identical to the sandbox testing. Running these tests again doesn't seem to provide additional value.

Can you keep only one test (rename TestRolloutWithTools -> TestRolloutWithSearchTool) , where it runs multiple rounds of conversation calling the search API, and the output matches your expected results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feifeibear Thanks again for your review and suggestions!

I've simplified the test file as suggested by removing redundant code and keeping only the TestRolloutWithSearchTool class. This test specifically verifies:

  • Whether the function call format for the search API input/output is correct;

  • Whether multi-turn conversations can successfully execute search tool calls.

In addition, I’ve updated and refined the documentation for the search tool. It now includes detailed instructions on using examples and configuring a local retriever.

I've completed the changes above, including updates to both tests and documentation.
Is it okay to proceed with the merge now?

return error_result, 0.0, {"error": str(e)}

async def calc_reward(self, instance_id: str, **kwargs) -> str:
return self._instance_dict[instance_id]["reward"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this reward return search result with str type? should we calculate search f1 as reward?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out!
This reward field is currently not used in our training logic — it's just a placeholder.
You can customize your own reward computation (e.g., based on search F1) in your tool implementation as needed.

@vermouth1992 vermouth1992 merged commit c5bc81b into volcengine:main May 31, 2025
30 of 36 checks passed
@zhaochenyang20
Copy link
Collaborator

great!

@SeungyounShin
Copy link
Contributor

Using the merged patch in this PR, I reran training on the original Search-R1 Wikipedia corpus (GRPO schedule, no additional data) and evaluated the resulting model.

Dataset Search-R1 paper (Qwen2.5-3B) This run
NQ 0.397 0.406
TriviaQA 0.565 0.582
PopQA 0.391 0.420
HotpotQA 0.331 0.338
2Wiki 0.310 0.332
Musique 0.124 0.111
Bamboogle 0.232 0.296

💾 Weights & full inference script are available on the Hub:
https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Everything matches the expected behaviour—tool calls, multi-turn rollout and scores. Thanks again for the thorough work! @Lins-01

@Lins-01
Copy link
Contributor Author

Lins-01 commented Jun 4, 2025

Using the merged patch in this PR, I reran training on the original Search-R1 Wikipedia corpus (GRPO schedule, no additional data) and evaluated the resulting model.

Dataset Search-R1 paper (Qwen2.5-3B) This run
NQ 0.397 0.406
TriviaQA 0.565 0.582
PopQA 0.391 0.420
HotpotQA 0.331 0.338
2Wiki 0.310 0.332
Musique 0.124 0.111
Bamboogle 0.232 0.296
💾 Weights & full inference script are available on the Hub: https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Everything matches the expected behaviour—tool calls, multi-turn rollout and scores. Thanks again for the thorough work! @Lins-01

Wow, thank you for the kind words! Really appreciate your recognition—it’s truly encouraging for our team.
If possible, could you share the training hyperparameters you used? I believe it would be helpful for the community (mine were slightly lower—haha).@SeungyounShin

@Lins-01 Lins-01 deleted the searchR1-like branch June 16, 2025 02:26
vermouth1992 pushed a commit that referenced this pull request Aug 15, 2025
### What does this PR do?

A small change from `json.dumps({"result": final_result})` to
`json.dumps({"result": final_result}, ensure_ascii=False)`, supporting
customized search engines that return docs containing non-ascii
characters (e.g., CJK characters).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
#1682
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Aug 15, 2025
…e#3044)

### What does this PR do?

A small change from `json.dumps({"result": final_result})` to
`json.dumps({"result": final_result}, ensure_ascii=False)`, supporting
customized search engines that return docs containing non-ascii
characters (e.g., CJK characters).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
volcengine#1682
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
techkang pushed a commit to techkang/verl that referenced this pull request Aug 15, 2025
…e#3044)

### What does this PR do?

A small change from `json.dumps({"result": final_result})` to
`json.dumps({"result": final_result}, ensure_ascii=False)`, supporting
customized search engines that return docs containing non-ascii
characters (e.g., CJK characters).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
volcengine#1682
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
ChangyiYang pushed a commit to SwordFaith/verl that referenced this pull request Aug 16, 2025
…e#3044)

### What does this PR do?

A small change from `json.dumps({"result": final_result})` to
`json.dumps({"result": final_result}, ensure_ascii=False)`, supporting
customized search engines that return docs containing non-ascii
characters (e.g., CJK characters).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
volcengine#1682
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
PopSoda2002 pushed a commit to PopSoda2002/verl that referenced this pull request Aug 19, 2025
…e#3044)

### What does this PR do?

A small change from `json.dumps({"result": final_result})` to
`json.dumps({"result": final_result}, ensure_ascii=False)`, supporting
customized search engines that return docs containing non-ascii
characters (e.g., CJK characters).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
volcengine#1682
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants