Skip to content

Disable parallel tool calls final answer #1539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 5, 2025

Conversation

aymeric-roucher
Copy link
Collaborator

@aymeric-roucher aymeric-roucher commented Jul 9, 2025

Until now in ToolCallingAgent, it's possible to call other tools in parallel with final_answer tool, so some weaker LLMs tend to call in the same call a web search tool, then final_answer tool, with misled expectations that these calls would be run sequentially and that the final answer would be informed by previous searches, when instead running these calls in parallel should just means that other tool calls than final_answer have no impact on the agent's return and are effectively useless, and that the LLM ends up force filling the final_answer() args with a hallucination.

Example: LLM returns this action, where the final answer is hallucinated instead of using the web search output:

# Let's get results
<tool_call>{"name": "get_weather", "arguments": {"city": "Prague"}}</tool_call>
<tool_call>{"name": "final_answer", "arguments": {"answer": "The weather is sunny in Prague"}}</tool_call>

To avoid this failure case, this PR forbids calling other tools in parallel with final_answer tool.

@aymeric-roucher aymeric-roucher force-pushed the disable-parallel-tool-calls-final-answer branch from 09e6cf1 to 019012f Compare July 9, 2025 16:45
@aymeric-roucher
Copy link
Collaborator Author

Failing test seems unrelated?

@@ -1621,18 +1689,12 @@ def forward(self, answer1: str, answer2: str) -> str:
name="final_answer", arguments={"answer1": "1", "answer2": "2"}
),
),
ChatMessageToolCall(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this call since it's not allowed anymore in parallel with final_answer

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a naive observation: I think this issue might actually be more general than just the final_answer tool. The underlying problem is that models sometimes issue multiple tool calls simultaneously, even when the intended semantics are sequential: i.e., one tool's output should influence another's input.

Restricting parallel tool calls when one logically depends on the other (e.g., web_search → final_answer) makes a lot of sense, but in principle, that shouldn't be specific to final_answer. This PR fixes one manifestation, but the broader class of failures remains.

My questions:

  • Should we forbid all tool calls with potential dependencies from being parallelized? That would require some way to infer or enforce dependencies between tool calls, but it's non-trivial to detect this. How to know when the tool calls can be processed in parallel or not?
  • Should we instead allow the model to emit multiple tool calls in one go (whether they are inter-dependent or independent), but execute them sequentially rather than in parallel? This would be a more robust approach, though it does come at the cost of increased latency of calls that could have been parallelized.

@albertvillanova
Copy link
Member

By the way, why do we use nullable keyword?

As far as I know, it is not part of the JSON Schema spec. They use required instead, with an array of required properties.

Are we sure LLMs reliably understand this custom nullable keyword? I wonder whether relying on non-standard schema elements might introduce inconsistencies in how different models interpret tool signatures.

  • In my PR about tool refactoring, I am using the standard JSON Schema instead.

@aymeric-roucher
Copy link
Collaborator Author

aymeric-roucher commented Jul 10, 2025

I think in general it's hard enforce sequential logic in parallel tool calls. Or if you have a solution I'd be interested, but I haven't seen one elsewhere.

Making a special case for final_answer tool is justified because this is a specific tool, it's the one that terminated the run. (in other frameworks, this is managed by just not calling a tool at all, so it's not really a tool like the others)

So I still think we should add this logic to enforce final answer only after other tool calls. And maybe open a follow-up PR for adding sequential control later, but again I think it's hard to enforce.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-ran the CI, and the CI error did not disappear. I think it is related to this PR,

aymeric-roucher and others added 3 commits July 10, 2025 16:43
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
@aymeric-roucher
Copy link
Collaborator Author

@albertvillanova I've fixed it!

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@albertvillanova albertvillanova merged commit 2aac521 into main Aug 5, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants