Skip to content

Support for Pictures in AI Chat #15407

@eneufeld

Description

@eneufeld

Use Case: Sending an Image to the Chat Assistant

Goal

A user sends an image to the LLM via the chat interface. The system validates model compatibility and informs the user if vision is unsupported. If supported, the image is retained in the session for context.


Steps

1. User Opens Chat Interface

  • The user opens the chat panel within the tool or IDE.

2. User Adds an Image

The user includes an image in the chat message using one of the following methods (listed by priority):

2.1 Paste from Clipboard

  • User pastes an image (e.g., screenshot) directly into the chat input using Ctrl+V / Cmd+V.

2.2 Drag and Drop

  • User drags an image file (e.g., .png, .jpg) into the chat window.

2.3 Upload via Button

  • User clicks a dedicated upload button (e.g., paperclip icon), selects an image from the file system, and attaches it.

2.4 Select from Workspace

  • User browses project/workspace files via file variable and selects an image to attach.

3. Image Appears in Chat Draft

  • The image is visually rendered as a thumbnail in the chat input area, along with any accompanying text the user types.

4. User Sends Message

  • User hits Enter or clicks “Send” to submit the message and attached image.

5. Framework Checks Model Capabilities

  • The chat framework checks if the currently selected LLM supports image inputs.

6. Model Does Not Support Images

If the model is not vision-capable:

  • The message is not sent.
  • A warning is shown:

    "The selected model does not support image input. Please remove the image or select a vision-capable model."

  • The message and image remain in the input for user correction.

7. Model Supports Images

If the model does support vision:

  • The image is sent as part of the message using the LLM’s required format (e.g., base64 or multipart).
  • The image is retained in the current conversation session and can be referred to later by the LLM.

8. Conversation Continues

  • The assistant processes the image and responds.
  • The image remains visible in the conversation history.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions