Skip to content

Conversation

aymeric-roucher
Copy link
Collaborator

@aymeric-roucher aymeric-roucher commented Jun 17, 2025

This refactor fixes #1440 and refactors the logic to address underlying issues such as "agent cannot return None in final_answer"

@aymeric-roucher aymeric-roucher changed the title Refacto StepOutputs for clearer final answer checks Fix and refactor final answer checks Jun 17, 2025
final_answer = self._handle_max_steps_reached(task, images)
yield action_step
yield FinalAnswerStep(handle_agent_output_types(final_answer))

def _execute_step(self, memory_step: ActionStep) -> Generator[ChatMessageStreamDelta | FinalOutput]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think porting this logic into a separate function obfuscates the logic more than it clarifies.

@@ -236,6 +238,7 @@ def __init__(
tools: list[Tool],
model: Model,
prompt_templates: PromptTemplates | None = None,
instructions: str | None = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #1442, will not appear after merging pr 1442.

@aymeric-roucher aymeric-roucher force-pushed the fix-final-answer-checks branch from a2e464e to fc7cc89 Compare June 17, 2025 18:17
@aymeric-roucher aymeric-roucher force-pushed the fix-final-answer-checks branch from b5d3d18 to 54dbde6 Compare June 17, 2025 20:17
@albertvillanova albertvillanova requested a review from Copilot June 18, 2025 19:04
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the final answer handling in agents and updates related tests to use PIL.Image instances and the renamed step_number field.

  • Refactored ActionStep.dict() to serialize images, add is_final_answer, and replace "step" with "step_number".
  • Replaced the old FinalOutput type with ActionOutput and updated streaming logic in agents.py.
  • Updated tests to use Image.new(...), renamed keys in dict assertions, and added scenarios for final answer checks.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/test_memory.py Switched from string paths to PIL.Image objects, renamed "step" to "step_number", added checks for observations_images.
tests/test_agents.py Added a test case for final_answer_checks, updated max_steps usage, and renamed planning step variables.
src/smolagents/memory.py Updated ActionStep.dict() to output step_number, serialize observations_images, and include is_final_answer.
src/smolagents/agents.py Removed FinalOutput, introduced ActionOutput, and refactored the streaming loop to validate and flag final answers.
Comments suppressed due to low confidence (2)

tests/test_memory.py:101

  • Consider adding an assertion to verify that action_step_dict["observations_images"] contains the expected byte data (e.g., matching Image.new(...).tobytes()).
    assert "observations_images" in action_step_dict

tests/test_memory.py:203

  • Add assertions in test_task_step_to_messages to confirm that the generated messages include an image entry (e.g., check for a content dict with "type": "image").
    task_step = TaskStep(task="This is a task.", task_images=[Image.new("RGB", (100, 100))])

@@ -110,8 +110,9 @@ def populate_template(template: str, variables: dict[str, Any]) -> str:


@dataclass
class FinalOutput:
output: Any | None
class ActionOutput:
Copy link
Preview

Copilot AI Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ActionOutput class is missing the @dataclass decorator, so instances won’t accept constructor arguments. Add @dataclass above this class definition.

Copilot uses AI. Check for mistakes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it isn't, go home copilot you're drunk

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing these issues.

Not sure about the need to replace FinalOutput with ActionOutput and the need to add the is_final_answer...

Comment on lines 622 to 631
agent = CodeAgent(
model=FakeCodeModel(),
tools=[],
final_answer_checks=[lambda x, y: x == 7.2904],
verbosity_level=1000,
)
output = agent.run("Dummy task.")
assert output == 7.2904 # Check that output is correct
assert len([step for step in agent.memory.steps if isinstance(step, ActionStep)]) == 2
assert "Error raised in check" not in str(agent.write_memory_to_messages())
Copy link
Member

@albertvillanova albertvillanova Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test passes before the fixes introduced in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, just fixed it to not be passing before!

@aymeric-roucher aymeric-roucher force-pushed the fix-final-answer-checks branch from e0b2e39 to ca5a539 Compare June 19, 2025 12:07
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the ActionOutput class will be:

  • Either ActionOutput(output: Any, is_final_answer=True) for final answer
  • Or ActionOutput(output=None, is_final_answer=False) for non final answer

IMHO it would be simpler to return:

  • Either FinalOutput(output: Any) for final answer
  • Or None for non final answer

I find the ActionOutput is non-optimal because it adds complexity without adding relevant information, but feel free to ignore it!

@aymeric-roucher aymeric-roucher merged commit 76ecb9b into main Jun 19, 2025
4 checks passed
@aymeric-roucher
Copy link
Collaborator Author

@Albert pasting my slack message here for visibility:

Thinking in terms of clients that will need to consume streaming events, I think it's simpler for frontends to consume a general object like ActionOutput and use their attributes rather than consuming FinalOutput | None : but I might still rework this, i'm still making lots of changes! The idea is to serve the same kind of streaming agents as other packages like copilotkit or openai agents) do.

@albertvillanova
Copy link
Member

@aymeric-roucher I think when streaming, users are already consuming Generator[ChatMessageStreamDelta | FinalOutput].

The alternative approach just adds a None: Generator[ChatMessageStreamDelta | FinalOutput | None].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] None check is not there for FinalOutput.output [BUG] Agent cannot run when final_answer_checks is provided
3 participants