Computer Use Tool

The computer_use tool enables GUI automation directly from the AI agent. It supports 10 actions for interacting with the desktop environment.

iNote
This tool auto-installs dependencies (xdotool, scrot) on first use. macOS uses native tools.

Actions

ActionParametersDescription
screenshotdisplay (optional)Capture the screen
clickx, y, button, clicksClick at coordinates
typetextType text at cursor
keykeysPress key combination (e.g. ctrl+c)
mouse_movex, yMove cursor to coordinates
scrollx, y, direction, amountScroll at position
dragstartX, startY, endX, endYClick-drag between points
window_list(none)List visible windows
window_focuswindowFocus window by name or ID
screen_size(none)Get screen dimensions

Examples

json
// Get screen size
{ "action": "screen_size" }
// Take a screenshot
{ "action": "screenshot" }
// Click at coordinates
{ "action": "click", "x": 500, "y": 300 }
// Type text
{ "action": "type", "text": "Hello World" }
// Press key combo
{ "action": "key", "keys": "ctrl+s" }
// List windows
{ "action": "window_list" }

Platform Support

PlatformTools Used
Linux (X11)xdotool, scrot / imagemagick / gnome-screenshot
Linux (Wayland)xdotool (limited), grim
macOSscreencapture, osascript (native)

Permissions

Each action requires computer_use permission. The agent must request permission per action type (e.g. screenshot, click).

PreviousACP (IDE Integration)NextMulti-Agent Swarms