Computer Use Tool
The computer_use tool enables GUI automation directly from the AI agent. It supports 10 actions for interacting with the desktop environment.
iNote
This tool auto-installs dependencies (xdotool, scrot) on first use. macOS uses native tools.
Actions
| Action | Parameters | Description |
|---|---|---|
| screenshot | display (optional) | Capture the screen |
| click | x, y, button, clicks | Click at coordinates |
| type | text | Type text at cursor |
| key | keys | Press key combination (e.g. ctrl+c) |
| mouse_move | x, y | Move cursor to coordinates |
| scroll | x, y, direction, amount | Scroll at position |
| drag | startX, startY, endX, endY | Click-drag between points |
| window_list | (none) | List visible windows |
| window_focus | window | Focus window by name or ID |
| screen_size | (none) | Get screen dimensions |
Examples
Platform Support
| Platform | Tools Used |
|---|---|
| Linux (X11) | xdotool, scrot / imagemagick / gnome-screenshot |
| Linux (Wayland) | xdotool (limited), grim |
| macOS | screencapture, osascript (native) |
Permissions
Each action requires computer_use permission. The agent must request permission per action type (e.g. screenshot, click).