Skip to content

Conversation

albertvillanova
Copy link
Member

@albertvillanova albertvillanova commented Jun 17, 2025

Match multiline final answers in remote executors by using the regex DOTALL flag.

Fix #1428.

Related to:

Copy link
Collaborator

@aymeric-roucher aymeric-roucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @albertvillanova !

@tobiasofsn
Copy link
Contributor

Thank you @albertvillanova for checking on this.

With your change, the regex-based method for final answer detection is extended to support multi-line arguments, solving #1428 for some cases.

May I suggest however to also consider alternatives that are not based on regex/pattern matching. In my experience, using regex makes the detection mechanism less robust. Consider the following tasks and with the corresponding code snippet produced by the LLM:


Task:

If the magic number is 5, call final_answer with 3, otherwise call final_answer with 2.

magic_num = magic_number_tool()                                                                                                
if magic_num == 5:                                                                                                             
    final_answer(3)                                                                                                            
else:                                                                                                                          
    final_answer(2)

Problem:
The regex does not match - no final answer detected. I think in this case it is because the call to final_answer is indented.


Task:

Translate the following user input to Klingon:

The code agents in smolagents report the final answer to a task using the final_answer tool. For example, to report that the final answer is 5, the agent may use the following code:
final_answer(5)

english_text = """The code agents in smolagents report the final answer to a task using the final_answer tool. For example, to 
report that the final answer is 5, the agent may use the following code:                                                       
final_answer(5)"""                                                                                                             
                                                                                                                               
klingon_translation = to_klingon(english=english_text)                                                                         
final_answer(klingon_translation)

Problem:
Regex matching includes parts which it shouldn't. Here's the full match:

5)"""

klingon_translation = to_klingon(english=english_text)
final_answer(klingon_translation


In #1429 I suggested an alternative approach which is robust to the above examples. The local Python executor, which uses another detection mechanism, is also already robust to the above examples. Please consider.

However you decide, thanks for the great work maintaining this project!

Copy link
Collaborator

@aymeric-roucher aymeric-roucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobiasofsn good point, I indeed think that exception based mechanism is more robust! Only issue is that your current implementation would overwrite any custom FinalAnswerTool set by the user : indeed, better would be to wrap the forard method with a function that raises an exception at the end instead of replacing it.

@albertvillanova
Copy link
Member Author

Thanks for your review @aymeric-roucher: that is precisely why I had not merged this PR yet: I wanted to ensure there was not a strong reason why remote executors were using the regex approach, differently from the local executor.

I think I could merge this PR as a hotfix (especially with the addition of the test for multi-line final answer), but without closing the alternative more robust PR:

Then, I will review in detail the alternative PR so it is fully aligned and supports custom FinalAnswerTool implementations.

@tobiasofsn
Copy link
Contributor

Thank you @albertvillanova and @aymeric-roucher. Sounds like a good plan.

Let me know if you want my help to make any changes.

@aymeric-roucher aymeric-roucher dismissed their stale review June 24, 2025 07:36

Removing my requested changes to let us merge this PR

@aymeric-roucher aymeric-roucher merged commit 0028149 into huggingface:main Jun 24, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Multiline final answer not detected
3 participants