From 2889d6a1414fc7ab0475e40644b4d4c2d5d90127 Mon Sep 17 00:00:00 2001 From: jayeonsoft Date: Tue, 27 Jan 2026 16:02:39 -0500 Subject: [PATCH 1/2] docs: add technical analysis of browser automation architecture --- docs/architecture/browser-automation.md | 32 +++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 docs/architecture/browser-automation.md diff --git a/docs/architecture/browser-automation.md b/docs/architecture/browser-automation.md new file mode 100644 index 000000000..cf21a29cf --- /dev/null +++ b/docs/architecture/browser-automation.md @@ -0,0 +1,32 @@ +# Clawdbot Browser Automation Architecture + +## 1. Libraries Used +* **Playwright (`playwright-core`)**: The primary engine for browser automation. Used for session management, page interactions (click, type, scroll), and capturing snapshots. +* **Chromium**: The underlying browser instance managed by Playwright. +* **Chrome DevTools Protocol (CDP)**: Used internally by Playwright and for specific low-level control where needed. +* **Express**: Hosts the "Browser Bridge" server (`src/browser/bridge-server.ts`), exposing a REST API for the agent to control the browser instance. + +## 2. Command Construction & Execution +The AI controls the browser through a structured tool definition and dispatch pipeline: + +1. **Tool Definition**: The `browser` tool is defined in `src/agents/tools/browser-tool.schema.ts` (using TypeBox). It exposes high-level actions like `navigate`, `act` (click/type/etc.), `snapshot`, and `screenshot`. +2. **Tool Invocation**: The AI calls the tool with specific parameters (e.g., `action="act"`, `request={ kind: "click", ref: "42" }`). +3. **Client Dispatch**: + * The tool implementation (`src/agents/tools/browser-tool.ts`) handles the request. + * It determines the target: **Host** (local) or **Node** (remote/sandbox). + * For **local execution**, it calls client functions in `src/browser/client-actions-core.ts`. +4. **Bridge Request**: The client sends an HTTP POST request to the local Bridge Server (e.g., `POST /act`). +5. **Execution**: The Bridge Server receives the request and executes the corresponding Playwright command (e.g., `page.click()`) on the active browser page. + +## 3. "Device Nodes" Architecture +Clawdbot uses a distributed architecture to control browsers across different environments (local machine, Docker sandbox, or remote devices). + +* **Node Registry**: `src/gateway/node-registry.ts` tracks connected nodes and their capabilities (e.g., `caps: ["browser"]`). +* **Proxying**: + * When the AI targets a remote node (e.g., `target="node"`), the tool uses `callBrowserProxy`. + * This sends a `node.invoke` event to the Gateway with the command `browser.proxy`. + * The Gateway forwards this event to the target node via WebSocket. +* **Node Execution**: + * The target node (running `src/node-host/runner.ts`) receives the `node.invoke` event. + * It handles the `browser.proxy` command by dispatching it to its *own* local browser control service. + * This effectively allows any running Clawdbot instance to act as a remote browser driver for the main agent. From 99ad7f455a553771fb6300984c776389db44ac43 Mon Sep 17 00:00:00 2001 From: jayeonsoft Date: Tue, 27 Jan 2026 16:45:11 -0500 Subject: [PATCH 2/2] docs: add browser architecture documentation --- docs/BROWSER_ARCHITECTURE.md | 104 +++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 docs/BROWSER_ARCHITECTURE.md diff --git a/docs/BROWSER_ARCHITECTURE.md b/docs/BROWSER_ARCHITECTURE.md new file mode 100644 index 000000000..a30d7e7b8 --- /dev/null +++ b/docs/BROWSER_ARCHITECTURE.md @@ -0,0 +1,104 @@ +# Clawdbot Browser Architecture + +This document outlines the architecture of the Clawdbot browser automation system, detailing how agent commands are translated into browser actions via a client-server model using Playwright and Chrome DevTools Protocol (CDP). + +## Workflow Diagram + +The following sequence diagram illustrates the end-to-end flow from an agent command to a browser action. + +```mermaid +sequenceDiagram + participant Agent + participant Client as BrowserClient (HTTP) + participant Server as BrowserServer (Express) + participant Context as ProfileContext + participant PW as PlaywrightSession + participant Chrome as Chrome Instance + + Note over Agent, Chrome: Opening a new tab (Example) + + Agent->>Client: browser.open(url) + Client->>Server: POST /tabs/open?profile=clawd + + Server->>Context: ctx.forProfile("clawd").openTab(url) + + alt Remote/Persistent Profile + Context->>PW: createPageViaPlaywright(cdpUrl, url) + PW->>Chrome: chromium.connectOverCDP() + PW->>Chrome: context.newPage() & page.goto() + Chrome-->>PW: Target Created + PW-->>Context: { targetId, title, url } + else Local/Direct CDP + Context->>Chrome: PUT /json/new?url=... + Chrome-->>Context: { id: targetId } + end + + Context->>Context: Update lastTargetId + Context-->>Server: BrowserTab + Server-->>Client: 200 OK (BrowserTab) + Client-->>Agent: Result { targetId: "..." } + + Note over Agent, Chrome: Taking a snapshot (Example) + + Agent->>Client: browser.snapshot(targetId) + Client->>Server: GET /snapshot?targetId=... + Server->>Context: ctx.forProfile("clawd").ensureTabAvailable(targetId) + Context-->>Server: BrowserTab + Server->>PW: getPageForTargetId(cdpUrl, targetId) + PW->>Chrome: Attach via CDP / Match URL + PW-->>Server: Playwright Page Object + Server->>PW: page.accessibility.snapshot() / html() + PW-->>Server: Snapshot Data + Server-->>Client: SnapshotResult + Client-->>Agent: Result +``` + +## Component Analysis + +The architecture is split into four distinct layers, ensuring separation of concerns between the agent interface, the control plane, and the low-level automation. + +### 1. Client Layer (`src/browser/client.ts`) +The **Client** provides a strongly-typed abstraction for agents to interact with the browser. It handles: +- **HTTP Communication**: Sends requests to the Browser Server (e.g., `/start`, `/snapshot`, `/tabs`). +- **Query Construction**: Serializes options like `profile`, `targetId`, and `format` into URL parameters. +- **Type Safety**: Ensures responses match expected interfaces (`BrowserTab`, `SnapshotResult`). + +### 2. Server Layer (`src/browser/server.ts`) +The **Server** is the control plane. It acts as an Express-based HTTP server that: +- **Lifecycle Management**: Starts and stops the browser automation service. +- **State Management**: Maintains a global `BrowserServerState` containing active profiles and configuration. +- **Routing**: Delegates incoming HTTP requests to the appropriate handlers in `src/browser/routes/`. + +### 3. Context & Profiles (`src/browser/server-context.ts`) +The **Context** layer manages the state and logic for individual browser profiles (e.g., "clawd" for automation, "extension" for user-driven sessions). +- **Profile Isolation**: Each profile (Default, Chrome, Edge) has its own `ProfileContext`. +- **Target Resolution**: Manages `lastTargetId` to provide continuity when an agent doesn't specify a target. +- **Abstraction**: Hides the difference between local loopback CDP interactions (via raw HTTP) and remote/persistent connections (via Playwright). + +### 4. Automation Layer (`src/browser/pw-session.ts`) +The **Automation** layer (Playwright Session) manages the direct connection to the browser instance. +- **CDP Connection**: Uses `chromium.connectOverCDP` to maintain a persistent WebSocket connection to Chrome. +- **Page Resolution**: The critical `getPageForTargetId` function maps a CDP `targetId` to a Playwright `Page` object, enabling the use of Playwright's rich API (locators, snapshots) on specific tabs. +- **State Tracking**: Maintains `PageState` (console logs, network requests, errors) using WeakMaps attached to Playwright Page objects. +- **Resilience**: Includes fallbacks for finding pages by URL when CDP attachment fails (common with extension-based relays). + +## Key Concepts + +### Target IDs +A `targetId` is the unique identifier for a specific browser tab or window (CDP Target). +- **Origin**: Generated by Chrome/CDP. +- **Usage**: Agents use this ID to direct actions (click, type, snapshot) to a specific page. +- **Flow**: The Server returns a `targetId` upon tab creation or listing. The Agent must provide this ID for subsequent actions, or the system falls back to the `lastTargetId`. + +### Profiles +Clawdbot supports multiple isolated browser profiles: +- **clawd**: A fully automated, headless (or headed) Chrome instance managed by the bot. +- **extension**: A relay mode that connects to a user's existing Chrome instance via the Clawdbot Browser Extension. +- **custom**: User-defined profiles with specific CDP endpoints. + +### Error Handling +Errors are propagated up the stack: +1. **Playwright/CDP**: Low-level connection or timeout errors are caught in `pw-session.ts`. +2. **Context**: Logic errors (e.g., "tab not found", "ambiguous target") are handled in `server-context.ts`. +3. **Server**: Maps exceptions to HTTP status codes (404 for missing tabs, 409 for ambiguous requests). +4. **Client**: Throws typed errors that the Agent can catch and handle (e.g., retrying a snapshot).