-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Required prerequisites
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
Motivation
Our primary method for handling image-related tasks is to have an agent execute a tool call. This call delegates the image analysis to a secondary agent or a vision-capable Large Language Model (LLM), which then returns the relevant information.
Optional Solutions for Future Exploration:
-
Embedded Image Processing: An alternative approach would be to send the user's message directly to ChatAgent.step. If an image is detected, a built-in function would be triggered to handle the image input, streamlining the process.
-
Sequence Validation: To ensure the conversational message history remains valid, we could programmatically add a placeholder (or "dummy") message to the tool's output.
-
clone the registered agent with memory of the original agent in ChatAgent.step, let the cloned agent step the image, give response
Solution
No response
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status