-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[tool] feat: Add Search Tool implemented with MCP #1948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
config: | ||
rate_limit: 120 | ||
timeout: 120 | ||
use_mcp: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to introduce a tool type attribute in the tool configuration. This attribute could specify categories like "native" for tools implemented in Verl, "mcp" for MCP tools, "A2A" for A2A tools, or "langchain/langgraph" for related tools. Each type could then have its own distinct creation and registration logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. Support native
and mcp
tools for now.
mcp: | ||
mcp_servers_config_path: ./mcp_server.json | ||
# optional | ||
tool_selected_list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tool selection list appears somewhat unclear here. Why would we need an MCP tool selection list for just a single MCP tool? Perhaps it would make more sense to have it as a parallel configuration alongside tools? Alternatively, we could use the two mentioned above as a shared mcp_client_manager
configuration within tools_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we'll register all tools from MCP server when tool_selected_list is not set. There are two reasons for tool_selected_list.
-
A MCP server has multiple tools, but user may not want to use all the tools. So we offered a tool_selected_list attribute for selecting a part of tools for registration.
-
We implement a general MCP tool using, but also implement the search tool, which is a single tool.
I'll refactor the config setting!
timeout: 120 | ||
use_mcp: true | ||
mcp: | ||
mcp_servers_config_path: ./mcp_server.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to include the config_path directly within the YAML file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, the most of configurations of MCP servers are adopted as a json file, including offical MCP, FastMCP or Cursor, etc. This setting may facilitate users' migration across different platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, the most of configurations of MCP servers are adopted as a json file, including offical MCP, FastMCP or Cursor, etc. This setting may facilitate users' migration across different platforms.
Agreed. If the JSON configuration already exists, it will be intuitive.
logger = logging.getLogger(__name__) | ||
|
||
|
||
class TokenBucket: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract the token bucket as a common utility in the current PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
verl/tools/mcp_base_tool.py
Outdated
} | ||
return instance_id | ||
|
||
async def _async_execute(self, instance_id, parameters): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming might be enhanced, and a return type-hint should be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
verl/tools/mcp_base_tool.py
Outdated
if instance_id in self._instance_dict: | ||
del self._instance_dict[instance_id] | ||
|
||
def post_process(self, content: list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the execute
semantics, it integrates the tool's return list into the tool's response, which undoubtedly impacts metrics and rewards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
], | ||
} | ||
|
||
expect_turn_1_msg = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to include a test case for two-turn calls? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, I've added that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good example of the "one tool per server" scenario. For more complex MCP workloads, let's track via issue and discuss further. A training script with wandb logging would also help more users try it.
metrics = {"query_count": metadata.get("query_count", 0), "status": metadata.get("status", "unknown"), "total_results": metadata.get("total_results", 0), "api_request_error": metadata.get("api_request_error")} | ||
|
||
return result_text, 0.0, metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is the base class for MCP Tools, shouldn't we return the raw metadata directly instead of extracting specific fields to construct metrics? The current approach not only causes metadata loss across different tools, but also arbitrarily restricts metadata keys, which seems unreasonable, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics are not used for training for now, which contains some request information only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics are not used for training for now, which contains some request information only.
Additional telemetry features and a new tool logger will be introduced in the upcoming retool PR. These enhancements aim to improve the tracking of MCP tool behavior.
verl/tools/mcp_base_tool.py
Outdated
logger.info(f"Initialized MCPSearchTool with config: {config}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
references to 'search' should be removed from the base class, particularly in lines 39, 73, 96, and 97.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
d7b4ebf
to
15ca75a
Compare
1. MCP client manager which manages the connection with MCP server, such as session multiplexing, rate limit. 2. Search Tool with MCP client and [Tavily](https://app.tavily.com/home) MCP server, which delivers the same capability with Search R1 Tool. 3. A general MCP tool base for handling the logic of executing. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. 1. Register a [Tavily](https://app.tavily.com/home) account 2. Edit the `mcp_server.json` file by replacing `url` and `auth_token`. Surely, you can use your own MCP server according to the instructions provided by [FastMCP](https://gofastmcp.com/clients/transports#configuration-based-transports) (supporting SSEServer, stdioServer and streamHTTP) 3. Configure the `mcp_tool_config.yaml` file: - `mcp_server_config_path` should point to the JSON file from step 2 - `tool_selected_list` specifies the tools you need to register from the MCP server 4. *(Optional)* Implement a concrete instance based on `MCPBaseTool` to parse the results returned by the server Details are listed in [tutorial](https://github.com/AlecHenx/ml-recipe/blob/main/Tutorial%20for%20MCP%20Tool%20in%20veRL.md) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes part of issue volcengine#1837 - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.
Hi @AlecHenx would you mind adding the tutorial https://github.com/AlecHenx/ml-recipe/blob/main/Tutorial%20for%20MCP%20Tool%20in%20verl.md as a subsection in https://verl.readthedocs.io/en/latest/sglang_multiturn/multiturn.html? (docs/sglang_multiturn/multiturn.rst) |
1. MCP client manager which manages the connection with MCP server, such as session multiplexing, rate limit. 2. Search Tool with MCP client and [Tavily](https://app.tavily.com/home) MCP server, which delivers the same capability with Search R1 Tool. 3. A general MCP tool base for handling the logic of executing. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. 1. Register a [Tavily](https://app.tavily.com/home) account 2. Edit the `mcp_server.json` file by replacing `url` and `auth_token`. Surely, you can use your own MCP server according to the instructions provided by [FastMCP](https://gofastmcp.com/clients/transports#configuration-based-transports) (supporting SSEServer, stdioServer and streamHTTP) 3. Configure the `mcp_tool_config.yaml` file: - `mcp_server_config_path` should point to the JSON file from step 2 - `tool_selected_list` specifies the tools you need to register from the MCP server 4. *(Optional)* Implement a concrete instance based on `MCPBaseTool` to parse the results returned by the server Details are listed in [tutorial](https://github.com/AlecHenx/ml-recipe/blob/main/Tutorial%20for%20MCP%20Tool%20in%20veRL.md) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes part of issue volcengine#1837 - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.
Hi @AlecHenx, thank you very much for this amazing PR! 🙌 However, I encountered an issue when trying to use the MCP search tool during PPO training. Initially, everything worked smoothly — the model was able to call the tool correctly, and the tool results were returned properly after the user message. But after some time, the MCP search tool started failing with the following error, although the training continued normally (just without receiving the search results):
Eventually, the entire pipeline gets stuck and times out — although I don't think this is directly caused by the error above. You can find the full logs and environment details at the links below: We ran the training script as follows: set -x
ulimit -n 65535
PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"
echo $CONFIG_PATH
TRAIN_DATA=~/data/searchR1_processed_nq/test.parquet
VAL_DATA=~/data/searchR1_processed_nq/test.parquet
TOOL_CONFIG="$CONFIG_PATH/tool_config/mcp_tool_config.yaml"
model_path=Qwen/Qwen2.5-1.5B-Instruct
python3 -m verl.trainer.main_ppo \
--config-path="$CONFIG_PATH" \
--config-name='search_multiturn_grpo' \
algorithm.adv_estimator=grpo \
data.train_batch_size=64 \
data.val_batch_size=256 \
data.max_prompt_length=4096 \
data.max_response_length=3000 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.return_raw_chat=True \
actor_rollout_ref.model.path=$model_path \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.285 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=64 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.max_model_len=15000 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.name=sglang \
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
actor_rollout_ref.rollout.n=5 \
actor_rollout_ref.rollout.multi_turn.enable=True \
actor_rollout_ref.rollout.multi_turn.max_assistant_turns=2 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
actor_rollout_ref.rollout.multi_turn.tool_config_path="$TOOL_CONFIG" \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.val_before_train=False \
trainer.logger=['console','wandb'] \
trainer.project_name='search_r1_like_async_rl' \
trainer.experiment_name='qwen2.5-1.5b-instruct_function_rm-search-async-sgl-multi-w-searchtool' \
trainer.n_gpus_per_node=2 \
trainer.nnodes=1 \
trainer.save_freq=1500 \
trainer.test_freq=1500 \
data.train_files="$TRAIN_DATA" \
data.val_files="$VAL_DATA" \
trainer.total_epochs=1 $@
|
This is because a large number of tool calls exceeded Tavily's rate limit, which caused an exception to be thrown before the call_tool_result received the search results. I will make the following two optimizations:
For now, you can try to decrease the rate limit size in |
Hi @AlecHenx, thanks for your response! Yes, I believe Tavily's rate limit is indeed the direct cause of the However, I still encountered the hang issue during the later stage of training: the entire pipeline keeps waiting for the tool result until it times out. During this period, the GPU remains fully allocated but with low power usage, indicating that it's not actively generating or updating. I suspect this is an issue between the MCP tool and the rollout logic, since the hang occurs with both the Tavily search tool and my dummy implementation. Here is the traceback from hanging case The failure appears to occur during a Click to expand full traceback
From my debugging, I’ve confirmed that the hang happens specifically at this line: await self._call_tool(instance_id, parameters) Temporarily replacing it with: await asyncio.wait_for(self._call_tool(instance_id, parameters), self.timeout) does prevent the hang, but it harms performance — many valid tool calls are skipped due to timeout. Notably, this issue does not occur when using non-MCP functional tools like
I found some possibly related issues:
Here are my full logs and dummy tool setup: MCP Serverfrom fastmcp import FastMCP
import asyncio
mcp = FastMCP("tavily_search_dummy_tool")
@mcp.tool()
async def tavily_search(query: str):
metadata = {
"query": query,
"results": "No results found"
}
return metadata
if __name__ == "__main__":
asyncio.run(mcp.run(transport='sse', port=50052)) MCP Clientimport json
import logging
import os
from typing import Tuple
from verl.tools.mcp_base_tool import MCPBaseTool
from .schemas import OpenAIFunctionToolSchema
logger = logging.getLogger(__name__)
logger.setLevel(os.getenv("VERL_LOGGING_LEVEL", "WARN"))
class MCPDummyTool(MCPBaseTool):
def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
super().__init__(config, tool_schema)
def _parse_tool_result(self, content: list) -> Tuple[str, dict]:
data = json.loads(content[0].text)
result = data["results"]
metadata = data
metadata["api_request_error"] = ""
return result, metadata MCP Server Config{
"mcpServers": {
"dummy_server": {
"url": "http://127.0.0.1:50052/sse/",
"transport": "sse",
"auth_token": "dummy"
}
}
} # Terminal 1: start the dummy server
python verl/tools/mcp_dummy_search_tool_server.py
# Terminal 2: launch PPO training Thank you again for your great support! |
Thank you for providing the details of hang problem, I'll have a look at it. |
Checklist Before Starting
fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt
feat, fix, doc, refactor, chore
,
or space, like[megatron, fsdp] feat: xxx
What does this PR do?
High-Level Design
Specific Changes
API
Usage Example
mcp_server.json
file by replacingurl
andauth_token
. Surely, you can use your own MCP server according to the instructions provided by FastMCP (supporting SSEServer, stdioServer and streamHTTP)mcp_tool_config.yaml
file:mcp_server_config_path
should point to the JSON file from step 2tool_selected_list
specifies the tools you need to register from the MCP serverMCPBaseTool
to parse the results returned by the serverDetails are listed in tutorial
Test
Additional Info.
Checklist Before Submitting
[BREAKING]
to the PR title if it breaks any API.