-
Notifications
You must be signed in to change notification settings - Fork 2k
Disable parallel tool calls final answer #1539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable parallel tool calls final answer #1539
Conversation
… in parallel with final answer
09e6cf1
to
019012f
Compare
Failing test seems unrelated? |
@@ -1621,18 +1689,12 @@ def forward(self, answer1: str, answer2: str) -> str: | |||
name="final_answer", arguments={"answer1": "1", "answer2": "2"} | |||
), | |||
), | |||
ChatMessageToolCall( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this call since it's not allowed anymore in parallel with final_answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a naive observation: I think this issue might actually be more general than just the final_answer
tool. The underlying problem is that models sometimes issue multiple tool calls simultaneously, even when the intended semantics are sequential: i.e., one tool's output should influence another's input.
Restricting parallel tool calls when one logically depends on the other (e.g., web_search → final_answer) makes a lot of sense, but in principle, that shouldn't be specific to final_answer. This PR fixes one manifestation, but the broader class of failures remains.
My questions:
- Should we forbid all tool calls with potential dependencies from being parallelized? That would require some way to infer or enforce dependencies between tool calls, but it's non-trivial to detect this. How to know when the tool calls can be processed in parallel or not?
- Should we instead allow the model to emit multiple tool calls in one go (whether they are inter-dependent or independent), but execute them sequentially rather than in parallel? This would be a more robust approach, though it does come at the cost of increased latency of calls that could have been parallelized.
By the way, why do we use As far as I know, it is not part of the JSON Schema spec. They use Are we sure LLMs reliably understand this custom
|
I think in general it's hard enforce sequential logic in parallel tool calls. Or if you have a solution I'd be interested, but I haven't seen one elsewhere. Making a special case for final_answer tool is justified because this is a specific tool, it's the one that terminated the run. (in other frameworks, this is managed by just not calling a tool at all, so it's not really a tool like the others) So I still think we should add this logic to enforce final answer only after other tool calls. And maybe open a follow-up PR for adding sequential control later, but again I think it's hard to enforce. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-ran the CI, and the CI error did not disappear. I think it is related to this PR,
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
@albertvillanova I've fixed it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Until now in ToolCallingAgent, it's possible to call other tools in parallel with final_answer tool, so some weaker LLMs tend to call in the same call a web search tool, then
final_answer
tool, with misled expectations that these calls would be run sequentially and that the final answer would be informed by previous searches, when instead running these calls in parallel should just means that other tool calls thanfinal_answer
have no impact on the agent's return and are effectively useless, and that the LLM ends up force filling thefinal_answer()
args with a hallucination.Example: LLM returns this action, where the final answer is hallucinated instead of using the web search output:
To avoid this failure case, this PR forbids calling other tools in parallel with
final_answer
tool.