[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training #1682

Lins-01 · 2025-05-25T10:33:24Z

[Feat]: Search Tool Invocation in Multi-Turn RL Training

Checklist Before

Search for similar PR(s).

What does this PR do?

As veRL users, we want the model to invoke designated tools during the Actor rollout phase and seamlessly integrate their outputs into the training pipeline.
We have added search-tool invocation capability to veRL-sglang MultiTurnRL, enabling the model to issue retrieval requests during Actor rollout and directly leverage the returned results for training.
providing the community with a reimplementation similar to searchR1.
Training curves on Wandb: search_async_rl
Third-party training reproducibility verification has been successfully completed.
Thanks to the SGlang team and the author of searchR1 for their efficient support!

Project Member:

Ling Chang (Author)
Bowen Jin (Advisor on Training)
Xiaocheng Wang (Advisor on Implementation)
Nan Jiang (Reproduce)
Chenyang Zhao (PM)
Xiang Long (Reviewer, PM)

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if necessary.

How to Use

Refer to verl-multiturn-searchR1-like.md or verl-multiturn-searchR1-like_ZH.md in the Awesome-ML-SYS-Tutorial repository.

CLAassistant · 2025-05-25T10:33:30Z

All committers have signed the CLA.

…-like

eric-haibin-lin · 2025-05-27T00:29:43Z

examples/sglang_multiturn/searchR1_like/local_dense_retriever/retrieval_server.py

@@ -0,0 +1,352 @@
+import argparse


let's use snake_case (search_r1_like)

eric-haibin-lin

thx for the great work. what is the reproduced model performance?

zhaochenyang20 · 2025-05-27T00:30:57Z

https://wandb.ai/lingchang-ustc/search_async_rl/workspace?nw=nwuserlingchang

@eric-haibin-lin Here is the training curve

SwordFaith · 2025-05-27T17:57:20Z

verl/tools/search_tool.py

+            tool_reward_score: The step reward score of the tool.
+            tool_metrics: The metrics of the tool.
+        """
+        timeout = parameters.get("timeout", self.default_timeout)


Why is timeout considered a tool parameter instead of being part of the tool configuration?

Thanks, I've made the change.

SwordFaith · 2025-05-27T17:59:23Z

verl/utils/reward_score/search_r1_like_utils.py

@@ -0,0 +1,214 @@
+# Copyright 2024 Bytedance Ltd. and/or its affiliates


This section of utils appears to be more aligned with a search tool's functionality rather than being tailored for reward scoring.

Relocated to verl/tools/utils/search_r1_like_utils.py.

SwordFaith · 2025-05-27T18:02:42Z

verl/tools/search_tool.py

+        }
+        return instance_id
+
+    def execute_search(self, instance_id: str, query_list: list, retrieval_service_url: str, topk: int, timeout: int):


Why not use perform_single_search_batch directly here?

I wrapped it for readability and future extensibility, but happy to inline if you prefer.

I was wondering if a user wants to extend this tool for their own search purposes. Should they subclass and override the execute_search method, simply configure a URL, or are there other possible approaches you can share? After discussion, we added doc to how to extend url & modify logic.

added the retriever tool modification notes in the README link above.

SwordFaith · 2025-05-27T18:05:46Z

examples/sft/gsm8k/run_qwen_05_peft.sh

Why is an empty file introduced here?

I didn’t modify this file myself — it was likely introduced during the git merge main.
Looking at it now, the content seems to be added by others and looks valid, so I kept it.

SwordFaith · 2025-05-27T18:07:54Z

examples/data_preprocess/prompt.yaml

Would it be better to move the prompt.yaml file to data_preprocess/config/prompt.yaml? If we add prompt.yaml, can we include it directly in the data preprocess for searching r1 data? Why is having a separate config file a better approach?

SwordFaith · 2025-05-28T03:31:41Z

examples/sglang_multiturn/config/tool_config/search_tool_config.yaml

@@ -0,0 +1,22 @@
+tools:
+  - class_name: "verl.tools.search_tool.SearchTool"
+    config: {


Directly using the YAML format would be more straightforward.

SwordFaith · 2025-05-28T03:48:42Z

verl/tools/search_tool.py

+        }
+        return instance_id
+
+    def execute_search(self, instance_id: str, query_list: list, retrieval_service_url: str, topk: int, timeout: int):


I was wondering if a user wants to extend this tool for their own search purposes. Should they subclass and override the execute_search method, simply configure a URL, or are there other possible approaches you can share? After discussion, we added doc to how to extend url & modify logic.

SwordFaith · 2025-05-28T12:33:43Z

LGTM

Lins-01 · 2025-05-29T10:19:25Z

@Lins-01 Thanks for your contributions for veRL! I noticed some of the code appears to be referenced from the following projects:

fufankeji/fufan-chat-api encoder.py RUC-NLPIR/FlashRAG utils.py

Could you please to check the licenses of the referenced code to avoid any potential legal issues?

Some code may have been duplicated with an existing PR.

https://github.com/volcengine/verl/pull/1525/files#

Also, a unit test for the search tooling functionality is welcome.

Appreciate the reminder and encouragement! License attributions have been added — will follow up with the unit test soon.

feifeibear

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

zhaochenyang20 · 2025-05-29T17:16:58Z

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

will add it with mock search api.

…ethod, verifying multi-turn rollout and success status in metrics.

Lins-01 · 2025-05-30T07:28:32Z

@feifeibear

LGTM. But I strongly suggest the author add necessary unit tests for the search tool using.

We've added the unit tests for the search tool as recommended.

feifeibear · 2025-05-30T08:59:45Z

tests/workers/rollout/test_sglang_async_rollout_search_tools.py

@@ -0,0 +1,522 @@
+# Copyright 2025 Bytedance Ltd. and/or its affiliates


The file has many duplicated code with the test_sglang_async_rollout_sf_tools.py.

Most of the testing appears redundant, as it's essentially identical to the sandbox testing. Running these tests again doesn't seem to provide additional value.

Can you keep only one test （rename TestRolloutWithTools -> TestRolloutWithSearchTool) , where it runs multiple rounds of conversation calling the search API, and the output matches your expected results?

@feifeibear Thanks again for your review and suggestions!

I've simplified the test file as suggested by removing redundant code and keeping only the TestRolloutWithSearchTool class. This test specifically verifies:

Whether the function call format for the search API input/output is correct;

Whether multi-turn conversations can successfully execute search tool calls.

In addition, I’ve updated and refined the documentation for the search tool. It now includes detailed instructions on using examples and configuring a local retriever.

I've completed the changes above, including updates to both tests and documentation.
Is it okay to proceed with the merge now?

keranlee · 2025-05-30T10:50:54Z

verl/tools/search_tool.py

+            return error_result, 0.0, {"error": str(e)}
+
+    async def calc_reward(self, instance_id: str, **kwargs) -> str:
+        return self._instance_dict[instance_id]["reward"]


why this reward return search result with str type? should we calculate search f1 as reward?

Thanks for pointing it out!
This reward field is currently not used in our training logic — it's just a placeholder.
You can customize your own reward computation (e.g., based on search F1) in your tool implementation as needed.

zhaochenyang20 · 2025-05-31T05:57:26Z

great!

SeungyounShin · 2025-06-03T05:53:37Z

Using the merged patch in this PR, I reran training on the original Search-R1 Wikipedia corpus (GRPO schedule, no additional data) and evaluated the resulting model.

Dataset	Search-R1 paper (Qwen2.5-3B)	This run
NQ	0.397	0.406
TriviaQA	0.565	0.582
PopQA	0.391	0.420
HotpotQA	0.331	0.338
2Wiki	0.310	0.332
Musique	0.124	0.111
Bamboogle	0.232	0.296

💾 Weights & full inference script are available on the Hub:
https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Everything matches the expected behaviour—tool calls, multi-turn rollout and scores. Thanks again for the thorough work! @Lins-01

…engine#1682)

Lins-01 · 2025-06-04T15:29:43Z

Using the merged patch in this PR, I reran training on the original Search-R1 Wikipedia corpus (GRPO schedule, no additional data) and evaluated the resulting model.

Dataset Search-R1 paper (Qwen2.5-3B) This run
NQ 0.397 0.406
TriviaQA 0.565 0.582
PopQA 0.391 0.420
HotpotQA 0.331 0.338
2Wiki 0.310 0.332
Musique 0.124 0.111
Bamboogle 0.232 0.296
💾 Weights & full inference script are available on the Hub: https://huggingface.co/Seungyoun/qwen2.5-3b-it_searchR1-like-multiturn

Everything matches the expected behaviour—tool calls, multi-turn rollout and scores. Thanks again for the thorough work! @Lins-01

Wow, thank you for the kind words! Really appreciate your recognition—it’s truly encouraging for our team.
If possible, could you share the training hyperparameters you used? I believe it would be helpful for the community (mine were slightly lower—haha).@SeungyounShin

…engine#1682)

### What does this PR do? A small change from `json.dumps({"result": final_result})` to `json.dumps({"result": final_result}, ensure_ascii=False)`, supporting customized search engines that return docs containing non-ascii characters (e.g., CJK characters). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: #1682 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test N/A ### API and Usage Example N/A ### Design & Code Changes N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…e#3044) ### What does this PR do? A small change from `json.dumps({"result": final_result})` to `json.dumps({"result": final_result}, ensure_ascii=False)`, supporting customized search engines that return docs containing non-ascii characters (e.g., CJK characters). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: volcengine#1682 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test N/A ### API and Usage Example N/A ### Design & Code Changes N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Lins-01 added 4 commits May 25, 2025 06:29

feat: searchR1-like for multi-turn

7798a85

add retriever download

078dac8

self-reproduction

005d1f4

modify config

e4948db

Lins-01 changed the title ~~feat: search tool for multi-turn~~ [feat]: search tool for multi-turn May 26, 2025

Lins-01 changed the title ~~[feat]: search tool for multi-turn~~ [sglang] feat: search tool for multi-turn May 26, 2025

Lins-01 changed the title ~~[sglang] feat: search tool for multi-turn~~ [sglang] Feat: Search Tool Invocation in Multi-Turn RL Training May 26, 2025

Lins-01 and others added 4 commits May 26, 2025 12:13

Merge branch 'volcengine:main' into searchR1-like

10cf504

add verl doc of Search Tool Integration

8fb2248

Merge branch 'volcengine:main' into searchR1-like

82c3877

Merge branch 'searchR1-like' of github.com:Lins-01/verl into searchR1…

3b58287

…-like

ccclyu added the status: need review label May 26, 2025

eric-haibin-lin reviewed May 27, 2025

View reviewed changes

Lins-01 and others added 2 commits May 27, 2025 11:05

Merge branch 'volcengine:main' into searchR1-like

df7e9f2

use snake_case

32a02aa

SwordFaith reviewed May 27, 2025

View reviewed changes

Lins-01 and others added 3 commits May 28, 2025 10:18

Merge branch 'volcengine:main' into searchR1-like

482c34a

Restructure search tool utils and config layout

199f5ec

del searchR1 data prompt yaml conifg

d4db488

SwordFaith reviewed May 28, 2025

View reviewed changes

SwordFaith mentioned this pull request May 28, 2025

Multi-turn rollout & agentic RL Status & Roadmap zhaochenyang20/Awesome-ML-SYS-Tutorial#131

Open

27 tasks

Lins-01 and others added 4 commits May 28, 2025 10:46

modify yaml and add retriever instruction

e65ffaf

Merge branch 'volcengine:main' into searchR1-like

9bc1a11

modify defualttimeout to timeout

802a30e

remove sh

295192b

Merge branch 'volcengine:main' into searchR1-like

de45edc

Add license notice for Search-R1 adapted code

a0ba37c

feifeibear approved these changes May 29, 2025

View reviewed changes

Lins-01 and others added 2 commits May 30, 2025 13:44

Merge branch 'volcengine:main' into searchR1-like

2ebcc8c

Add unit test for batch execution of SearchTool with mocked execute m…

bb08d52

…ethod, verifying multi-turn rollout and success status in metrics.

feifeibear reviewed May 30, 2025

View reviewed changes

keranlee reviewed May 30, 2025

View reviewed changes

Lins-01 and others added 2 commits May 31, 2025 01:15

Merge branch 'volcengine:main' into searchR1-like

0c05baf

refine search unit test

72cf23a

vermouth1992 approved these changes May 31, 2025

View reviewed changes

vermouth1992 merged commit c5bc81b into volcengine:main May 31, 2025
30 of 36 checks passed

yzlnew pushed a commit to yzlnew/verl that referenced this pull request Jun 4, 2025

[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training (volc…

bc8dc45

…engine#1682)

feifeibear mentioned this pull request Jun 6, 2025

[roadmap] Rollout Module Development Progress & Roadmap #1881

Closed

12 tasks

chenhaiq mentioned this pull request Jun 6, 2025

[roadmap] Rollout Module Development Progress & Roadmap #1882

Open

13 tasks

Zzhiter pushed a commit to Zzhiter/verl that referenced this pull request Jun 9, 2025

[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training (volc…

0efac7b

…engine#1682)

wwwjn pushed a commit to wwwjn/verl that referenced this pull request Jun 10, 2025

[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training (volc…

606f138

…engine#1682)

Lins-01 deleted the searchR1-like branch June 16, 2025 02:26

ummavi mentioned this pull request Jun 18, 2025

[Bug] SGLang async sampling parameter corruption #2087

Open

Necolizer mentioned this pull request Aug 14, 2025

[tool] fix: support non-ascii characters in search results #3044

Merged

7 tasks

		@@ -0,0 +1,214 @@
		# Copyright 2024 Bytedance Ltd. and/or its affiliates

		@@ -0,0 +1,522 @@
		# Copyright 2025 Bytedance Ltd. and/or its affiliates

[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training #1682

[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training #1682

Uh oh!

Conversation

Lins-01 commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Feat]: Search Tool Invocation in Multi-Turn RL Training

Checklist Before

What does this PR do?

Checklist Before Submitting

How to Use

Uh oh!

CLAassistant commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented May 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lins-01 May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwordFaith May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwordFaith May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwordFaith commented May 28, 2025

Uh oh!

Lins-01 commented May 29, 2025

Uh oh!

feifeibear left a comment

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented May 29, 2025

Uh oh!

Lins-01 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhaochenyang20 commented May 31, 2025

Uh oh!

SeungyounShin commented Jun 3, 2025

Lins-01 commented May 25, 2025 •

edited

Loading

CLAassistant commented May 25, 2025 •

edited

Loading

Lins-01 May 28, 2025 •

edited

Loading

SwordFaith May 28, 2025 •

edited

Loading

SwordFaith May 28, 2025 •

edited

Loading

Lins-01 commented May 30, 2025 •

edited

Loading