-
Notifications
You must be signed in to change notification settings - Fork 332
Description
Since Anthropic just announced their computer use stuff (see #50 (comment) and anthropic.com/news/3-5-models-and-computer-use), we should just finish ours as we already have the screenshots.
We can take screenshots, we just need to enable acting on them by clicking or making input.
To not burn tons of tokens we should probably put it in a loop where it doesn't stack tons view/interact steps, maybe use a looping subagent or some kind of context-efficient tool-use loop until goal is achieved (we might need some stuff like this generally for automation goals). Should study how they do it.
They run it in a Docker container and stream it in a webapp using VNC. We should make it possible to do it this way with gptme, but I think gptme should be able to control the local system first, and a Docker system second.
Milestones
- Can it Tweet?
- Can it play Factorio? (prob joke tweet: https://x.com/aphysicist/status/1848802806729228782)
- Can it play Doom? (latency if we don't want to suck)
- https://manifold.markets/singer/will-ai-automate-guis-by-end-of-202