openclaw/docs/guides/computer-server-setup.md
f-trycua ae736be73f feat(tools): add computer tool for GUI automation via cua-computer-server
Add a new `computer` tool that enables agents to control desktop GUIs:
- Take screenshots
- Click, double-click, right-click at coordinates
- Type text, press keys, hotkey combinations
- Scroll in any direction
- Move cursor, drag operations
- Get screen size and cursor position

Integrates with cua-computer-server which can run in:
- Sandbox mode (cua-xfce, cua-ubuntu Docker images)
- Node mode (Linux with supervisor setup)

Includes setup documentation and example screenshot.
2026-01-25 12:31:43 -08:00

4.2 KiB

Computer-Server Setup for GUI Automation

This guide explains how to set up cua-computer-server for desktop GUI automation with Clawdbot agents.

Overview

The computer tool enables agents to:

  • Take screenshots of the desktop
  • Click at screen coordinates
  • Type text and press keys
  • Scroll and drag
  • Control native applications

Setup Options

Use Cua's pre-built desktop sandbox images. Computer-server is already installed and running.

cua-xfce desktop sandbox

# In your clawdbot config
agents:
  defaults:
    sandbox:
      docker:
        image: ghcr.io/trycua/cua-xfce:latest

Available sandbox images:

Image Description
ghcr.io/trycua/cua-xfce:latest Linux + XFCE desktop
ghcr.io/trycua/cua-ubuntu:latest Linux + Kasm desktop

For more options including Windows and Android, see Cua Desktop Sandboxes.

Option 2: Node Mode (Linux)

For Linux nodes, you can set up computer-server to run as a daemon using supervisor.

1. Install computer-server

pip install cua-computer-server

2. Create startup script

Create /usr/local/bin/start-computer-server.sh:

#!/bin/bash
set -e

# Wait for X server to be ready
echo "Waiting for X server to start..."
while ! xdpyinfo -display :0 >/dev/null 2>&1; do
    sleep 1
done
echo "X server is ready"

# Start computer-server
export DISPLAY=:0
python3 -m computer_server --port 8000

Make it executable:

chmod +x /usr/local/bin/start-computer-server.sh

3. Configure supervisor

Create /etc/supervisor/conf.d/computer-server.conf:

[program:computer-server]
command=/usr/local/bin/start-computer-server.sh
user=<your-user>
autorestart=true
stdout_logfile=/var/log/computer-server.log
stderr_logfile=/var/log/computer-server.error.log

4. Start the service

sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl start computer-server

Option 3: Node Mode (Windows/macOS)

TBD - Setup instructions for Windows and macOS nodes will be added in a future update.

Usage

Once computer-server is running, agents can use the computer tool:

Use the computer tool to take a screenshot and then click on the "File" menu.

The agent will:

  1. Call computer with action: "screenshot" to see the screen
  2. Identify the coordinates of "File" menu
  3. Call computer with action: "click" and the coordinates

Configuration

You can configure the computer-server URL in your Clawdbot config:

tools:
  computer:
    serverUrl: "http://localhost:8000"

Or per-agent:

agents:
  my-agent:
    tools:
      computer:
        serverUrl: "http://192.168.1.100:8000"

MCP Alternative

Computer-server also exposes an MCP (Model Context Protocol) interface at /mcp. This could be used as an alternative integration method if Clawdbot adds MCP client support in the future.

To use computer-server as a standalone MCP server:

python -m computer_server --mcp

Troubleshooting

Connection refused

Ensure computer-server is running and accessible:

curl http://localhost:8000/status

Should return: {"status": "ok", "os_type": "...", "features": [...]}

No display

If computer-server fails with display errors, ensure:

  • X server is running (Linux)
  • The DISPLAY environment variable is set correctly
  • The user has permission to access the display

Sandbox not connecting

Ensure the sandbox container has port 8000 exposed and the network is accessible from Clawdbot.

Security Considerations

  • Sandbox mode: Isolated container - safe for untrusted workloads
  • Node mode: Controls the actual device screen - use only with trusted agents
  • Gateway mode: Not supported - would give agents control of your actual desktop