This commit is contained in:
agent-j 2026-01-30 13:06:43 +08:00 committed by GitHub
commit 4d31387bef
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 136 additions and 0 deletions

View File

@ -0,0 +1,104 @@
# Clawdbot Browser Architecture
This document outlines the architecture of the Clawdbot browser automation system, detailing how agent commands are translated into browser actions via a client-server model using Playwright and Chrome DevTools Protocol (CDP).
## Workflow Diagram
The following sequence diagram illustrates the end-to-end flow from an agent command to a browser action.
```mermaid
sequenceDiagram
participant Agent
participant Client as BrowserClient (HTTP)
participant Server as BrowserServer (Express)
participant Context as ProfileContext
participant PW as PlaywrightSession
participant Chrome as Chrome Instance
Note over Agent, Chrome: Opening a new tab (Example)
Agent->>Client: browser.open(url)
Client->>Server: POST /tabs/open?profile=clawd
Server->>Context: ctx.forProfile("clawd").openTab(url)
alt Remote/Persistent Profile
Context->>PW: createPageViaPlaywright(cdpUrl, url)
PW->>Chrome: chromium.connectOverCDP()
PW->>Chrome: context.newPage() & page.goto()
Chrome-->>PW: Target Created
PW-->>Context: { targetId, title, url }
else Local/Direct CDP
Context->>Chrome: PUT /json/new?url=...
Chrome-->>Context: { id: targetId }
end
Context->>Context: Update lastTargetId
Context-->>Server: BrowserTab
Server-->>Client: 200 OK (BrowserTab)
Client-->>Agent: Result { targetId: "..." }
Note over Agent, Chrome: Taking a snapshot (Example)
Agent->>Client: browser.snapshot(targetId)
Client->>Server: GET /snapshot?targetId=...
Server->>Context: ctx.forProfile("clawd").ensureTabAvailable(targetId)
Context-->>Server: BrowserTab
Server->>PW: getPageForTargetId(cdpUrl, targetId)
PW->>Chrome: Attach via CDP / Match URL
PW-->>Server: Playwright Page Object
Server->>PW: page.accessibility.snapshot() / html()
PW-->>Server: Snapshot Data
Server-->>Client: SnapshotResult
Client-->>Agent: Result
```
## Component Analysis
The architecture is split into four distinct layers, ensuring separation of concerns between the agent interface, the control plane, and the low-level automation.
### 1. Client Layer (`src/browser/client.ts`)
The **Client** provides a strongly-typed abstraction for agents to interact with the browser. It handles:
- **HTTP Communication**: Sends requests to the Browser Server (e.g., `/start`, `/snapshot`, `/tabs`).
- **Query Construction**: Serializes options like `profile`, `targetId`, and `format` into URL parameters.
- **Type Safety**: Ensures responses match expected interfaces (`BrowserTab`, `SnapshotResult`).
### 2. Server Layer (`src/browser/server.ts`)
The **Server** is the control plane. It acts as an Express-based HTTP server that:
- **Lifecycle Management**: Starts and stops the browser automation service.
- **State Management**: Maintains a global `BrowserServerState` containing active profiles and configuration.
- **Routing**: Delegates incoming HTTP requests to the appropriate handlers in `src/browser/routes/`.
### 3. Context & Profiles (`src/browser/server-context.ts`)
The **Context** layer manages the state and logic for individual browser profiles (e.g., "clawd" for automation, "extension" for user-driven sessions).
- **Profile Isolation**: Each profile (Default, Chrome, Edge) has its own `ProfileContext`.
- **Target Resolution**: Manages `lastTargetId` to provide continuity when an agent doesn't specify a target.
- **Abstraction**: Hides the difference between local loopback CDP interactions (via raw HTTP) and remote/persistent connections (via Playwright).
### 4. Automation Layer (`src/browser/pw-session.ts`)
The **Automation** layer (Playwright Session) manages the direct connection to the browser instance.
- **CDP Connection**: Uses `chromium.connectOverCDP` to maintain a persistent WebSocket connection to Chrome.
- **Page Resolution**: The critical `getPageForTargetId` function maps a CDP `targetId` to a Playwright `Page` object, enabling the use of Playwright's rich API (locators, snapshots) on specific tabs.
- **State Tracking**: Maintains `PageState` (console logs, network requests, errors) using WeakMaps attached to Playwright Page objects.
- **Resilience**: Includes fallbacks for finding pages by URL when CDP attachment fails (common with extension-based relays).
## Key Concepts
### Target IDs
A `targetId` is the unique identifier for a specific browser tab or window (CDP Target).
- **Origin**: Generated by Chrome/CDP.
- **Usage**: Agents use this ID to direct actions (click, type, snapshot) to a specific page.
- **Flow**: The Server returns a `targetId` upon tab creation or listing. The Agent must provide this ID for subsequent actions, or the system falls back to the `lastTargetId`.
### Profiles
Clawdbot supports multiple isolated browser profiles:
- **clawd**: A fully automated, headless (or headed) Chrome instance managed by the bot.
- **extension**: A relay mode that connects to a user's existing Chrome instance via the Clawdbot Browser Extension.
- **custom**: User-defined profiles with specific CDP endpoints.
### Error Handling
Errors are propagated up the stack:
1. **Playwright/CDP**: Low-level connection or timeout errors are caught in `pw-session.ts`.
2. **Context**: Logic errors (e.g., "tab not found", "ambiguous target") are handled in `server-context.ts`.
3. **Server**: Maps exceptions to HTTP status codes (404 for missing tabs, 409 for ambiguous requests).
4. **Client**: Throws typed errors that the Agent can catch and handle (e.g., retrying a snapshot).

View File

@ -0,0 +1,32 @@
# Clawdbot Browser Automation Architecture
## 1. Libraries Used
* **Playwright (`playwright-core`)**: The primary engine for browser automation. Used for session management, page interactions (click, type, scroll), and capturing snapshots.
* **Chromium**: The underlying browser instance managed by Playwright.
* **Chrome DevTools Protocol (CDP)**: Used internally by Playwright and for specific low-level control where needed.
* **Express**: Hosts the "Browser Bridge" server (`src/browser/bridge-server.ts`), exposing a REST API for the agent to control the browser instance.
## 2. Command Construction & Execution
The AI controls the browser through a structured tool definition and dispatch pipeline:
1. **Tool Definition**: The `browser` tool is defined in `src/agents/tools/browser-tool.schema.ts` (using TypeBox). It exposes high-level actions like `navigate`, `act` (click/type/etc.), `snapshot`, and `screenshot`.
2. **Tool Invocation**: The AI calls the tool with specific parameters (e.g., `action="act"`, `request={ kind: "click", ref: "42" }`).
3. **Client Dispatch**:
* The tool implementation (`src/agents/tools/browser-tool.ts`) handles the request.
* It determines the target: **Host** (local) or **Node** (remote/sandbox).
* For **local execution**, it calls client functions in `src/browser/client-actions-core.ts`.
4. **Bridge Request**: The client sends an HTTP POST request to the local Bridge Server (e.g., `POST /act`).
5. **Execution**: The Bridge Server receives the request and executes the corresponding Playwright command (e.g., `page.click()`) on the active browser page.
## 3. "Device Nodes" Architecture
Clawdbot uses a distributed architecture to control browsers across different environments (local machine, Docker sandbox, or remote devices).
* **Node Registry**: `src/gateway/node-registry.ts` tracks connected nodes and their capabilities (e.g., `caps: ["browser"]`).
* **Proxying**:
* When the AI targets a remote node (e.g., `target="node"`), the tool uses `callBrowserProxy`.
* This sends a `node.invoke` event to the Gateway with the command `browser.proxy`.
* The Gateway forwards this event to the target node via WebSocket.
* **Node Execution**:
* The target node (running `src/node-host/runner.ts`) receives the `node.invoke` event.
* It handles the `browser.proxy` command by dispatching it to its *own* local browser control service.
* This effectively allows any running Clawdbot instance to act as a remote browser driver for the main agent.