Skip to content

Complete "computer use" support #216

@ErikBjare

Description

@ErikBjare

Since Anthropic just announced their computer use stuff (see #50 (comment) and anthropic.com/news/3-5-models-and-computer-use), we should just finish ours as we already have the screenshots.

We can take screenshots, we just need to enable acting on them by clicking or making input.

To not burn tons of tokens we should probably put it in a loop where it doesn't stack tons view/interact steps, maybe use a looping subagent or some kind of context-efficient tool-use loop until goal is achieved (we might need some stuff like this generally for automation goals). Should study how they do it.

They run it in a Docker container and stream it in a webapp using VNC. We should make it possible to do it this way with gptme, but I think gptme should be able to control the local system first, and a Docker system second.

Milestones

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions