Merge 99ad7f455a into 4de0bae45a
This commit is contained in:
commit
4d31387bef
104
docs/BROWSER_ARCHITECTURE.md
Normal file
104
docs/BROWSER_ARCHITECTURE.md
Normal file
@ -0,0 +1,104 @@
|
||||
# Clawdbot Browser Architecture
|
||||
|
||||
This document outlines the architecture of the Clawdbot browser automation system, detailing how agent commands are translated into browser actions via a client-server model using Playwright and Chrome DevTools Protocol (CDP).
|
||||
|
||||
## Workflow Diagram
|
||||
|
||||
The following sequence diagram illustrates the end-to-end flow from an agent command to a browser action.
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Agent
|
||||
participant Client as BrowserClient (HTTP)
|
||||
participant Server as BrowserServer (Express)
|
||||
participant Context as ProfileContext
|
||||
participant PW as PlaywrightSession
|
||||
participant Chrome as Chrome Instance
|
||||
|
||||
Note over Agent, Chrome: Opening a new tab (Example)
|
||||
|
||||
Agent->>Client: browser.open(url)
|
||||
Client->>Server: POST /tabs/open?profile=clawd
|
||||
|
||||
Server->>Context: ctx.forProfile("clawd").openTab(url)
|
||||
|
||||
alt Remote/Persistent Profile
|
||||
Context->>PW: createPageViaPlaywright(cdpUrl, url)
|
||||
PW->>Chrome: chromium.connectOverCDP()
|
||||
PW->>Chrome: context.newPage() & page.goto()
|
||||
Chrome-->>PW: Target Created
|
||||
PW-->>Context: { targetId, title, url }
|
||||
else Local/Direct CDP
|
||||
Context->>Chrome: PUT /json/new?url=...
|
||||
Chrome-->>Context: { id: targetId }
|
||||
end
|
||||
|
||||
Context->>Context: Update lastTargetId
|
||||
Context-->>Server: BrowserTab
|
||||
Server-->>Client: 200 OK (BrowserTab)
|
||||
Client-->>Agent: Result { targetId: "..." }
|
||||
|
||||
Note over Agent, Chrome: Taking a snapshot (Example)
|
||||
|
||||
Agent->>Client: browser.snapshot(targetId)
|
||||
Client->>Server: GET /snapshot?targetId=...
|
||||
Server->>Context: ctx.forProfile("clawd").ensureTabAvailable(targetId)
|
||||
Context-->>Server: BrowserTab
|
||||
Server->>PW: getPageForTargetId(cdpUrl, targetId)
|
||||
PW->>Chrome: Attach via CDP / Match URL
|
||||
PW-->>Server: Playwright Page Object
|
||||
Server->>PW: page.accessibility.snapshot() / html()
|
||||
PW-->>Server: Snapshot Data
|
||||
Server-->>Client: SnapshotResult
|
||||
Client-->>Agent: Result
|
||||
```
|
||||
|
||||
## Component Analysis
|
||||
|
||||
The architecture is split into four distinct layers, ensuring separation of concerns between the agent interface, the control plane, and the low-level automation.
|
||||
|
||||
### 1. Client Layer (`src/browser/client.ts`)
|
||||
The **Client** provides a strongly-typed abstraction for agents to interact with the browser. It handles:
|
||||
- **HTTP Communication**: Sends requests to the Browser Server (e.g., `/start`, `/snapshot`, `/tabs`).
|
||||
- **Query Construction**: Serializes options like `profile`, `targetId`, and `format` into URL parameters.
|
||||
- **Type Safety**: Ensures responses match expected interfaces (`BrowserTab`, `SnapshotResult`).
|
||||
|
||||
### 2. Server Layer (`src/browser/server.ts`)
|
||||
The **Server** is the control plane. It acts as an Express-based HTTP server that:
|
||||
- **Lifecycle Management**: Starts and stops the browser automation service.
|
||||
- **State Management**: Maintains a global `BrowserServerState` containing active profiles and configuration.
|
||||
- **Routing**: Delegates incoming HTTP requests to the appropriate handlers in `src/browser/routes/`.
|
||||
|
||||
### 3. Context & Profiles (`src/browser/server-context.ts`)
|
||||
The **Context** layer manages the state and logic for individual browser profiles (e.g., "clawd" for automation, "extension" for user-driven sessions).
|
||||
- **Profile Isolation**: Each profile (Default, Chrome, Edge) has its own `ProfileContext`.
|
||||
- **Target Resolution**: Manages `lastTargetId` to provide continuity when an agent doesn't specify a target.
|
||||
- **Abstraction**: Hides the difference between local loopback CDP interactions (via raw HTTP) and remote/persistent connections (via Playwright).
|
||||
|
||||
### 4. Automation Layer (`src/browser/pw-session.ts`)
|
||||
The **Automation** layer (Playwright Session) manages the direct connection to the browser instance.
|
||||
- **CDP Connection**: Uses `chromium.connectOverCDP` to maintain a persistent WebSocket connection to Chrome.
|
||||
- **Page Resolution**: The critical `getPageForTargetId` function maps a CDP `targetId` to a Playwright `Page` object, enabling the use of Playwright's rich API (locators, snapshots) on specific tabs.
|
||||
- **State Tracking**: Maintains `PageState` (console logs, network requests, errors) using WeakMaps attached to Playwright Page objects.
|
||||
- **Resilience**: Includes fallbacks for finding pages by URL when CDP attachment fails (common with extension-based relays).
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Target IDs
|
||||
A `targetId` is the unique identifier for a specific browser tab or window (CDP Target).
|
||||
- **Origin**: Generated by Chrome/CDP.
|
||||
- **Usage**: Agents use this ID to direct actions (click, type, snapshot) to a specific page.
|
||||
- **Flow**: The Server returns a `targetId` upon tab creation or listing. The Agent must provide this ID for subsequent actions, or the system falls back to the `lastTargetId`.
|
||||
|
||||
### Profiles
|
||||
Clawdbot supports multiple isolated browser profiles:
|
||||
- **clawd**: A fully automated, headless (or headed) Chrome instance managed by the bot.
|
||||
- **extension**: A relay mode that connects to a user's existing Chrome instance via the Clawdbot Browser Extension.
|
||||
- **custom**: User-defined profiles with specific CDP endpoints.
|
||||
|
||||
### Error Handling
|
||||
Errors are propagated up the stack:
|
||||
1. **Playwright/CDP**: Low-level connection or timeout errors are caught in `pw-session.ts`.
|
||||
2. **Context**: Logic errors (e.g., "tab not found", "ambiguous target") are handled in `server-context.ts`.
|
||||
3. **Server**: Maps exceptions to HTTP status codes (404 for missing tabs, 409 for ambiguous requests).
|
||||
4. **Client**: Throws typed errors that the Agent can catch and handle (e.g., retrying a snapshot).
|
||||
32
docs/architecture/browser-automation.md
Normal file
32
docs/architecture/browser-automation.md
Normal file
@ -0,0 +1,32 @@
|
||||
# Clawdbot Browser Automation Architecture
|
||||
|
||||
## 1. Libraries Used
|
||||
* **Playwright (`playwright-core`)**: The primary engine for browser automation. Used for session management, page interactions (click, type, scroll), and capturing snapshots.
|
||||
* **Chromium**: The underlying browser instance managed by Playwright.
|
||||
* **Chrome DevTools Protocol (CDP)**: Used internally by Playwright and for specific low-level control where needed.
|
||||
* **Express**: Hosts the "Browser Bridge" server (`src/browser/bridge-server.ts`), exposing a REST API for the agent to control the browser instance.
|
||||
|
||||
## 2. Command Construction & Execution
|
||||
The AI controls the browser through a structured tool definition and dispatch pipeline:
|
||||
|
||||
1. **Tool Definition**: The `browser` tool is defined in `src/agents/tools/browser-tool.schema.ts` (using TypeBox). It exposes high-level actions like `navigate`, `act` (click/type/etc.), `snapshot`, and `screenshot`.
|
||||
2. **Tool Invocation**: The AI calls the tool with specific parameters (e.g., `action="act"`, `request={ kind: "click", ref: "42" }`).
|
||||
3. **Client Dispatch**:
|
||||
* The tool implementation (`src/agents/tools/browser-tool.ts`) handles the request.
|
||||
* It determines the target: **Host** (local) or **Node** (remote/sandbox).
|
||||
* For **local execution**, it calls client functions in `src/browser/client-actions-core.ts`.
|
||||
4. **Bridge Request**: The client sends an HTTP POST request to the local Bridge Server (e.g., `POST /act`).
|
||||
5. **Execution**: The Bridge Server receives the request and executes the corresponding Playwright command (e.g., `page.click()`) on the active browser page.
|
||||
|
||||
## 3. "Device Nodes" Architecture
|
||||
Clawdbot uses a distributed architecture to control browsers across different environments (local machine, Docker sandbox, or remote devices).
|
||||
|
||||
* **Node Registry**: `src/gateway/node-registry.ts` tracks connected nodes and their capabilities (e.g., `caps: ["browser"]`).
|
||||
* **Proxying**:
|
||||
* When the AI targets a remote node (e.g., `target="node"`), the tool uses `callBrowserProxy`.
|
||||
* This sends a `node.invoke` event to the Gateway with the command `browser.proxy`.
|
||||
* The Gateway forwards this event to the target node via WebSocket.
|
||||
* **Node Execution**:
|
||||
* The target node (running `src/node-host/runner.ts`) receives the `node.invoke` event.
|
||||
* It handles the `browser.proxy` command by dispatching it to its *own* local browser control service.
|
||||
* This effectively allows any running Clawdbot instance to act as a remote browser driver for the main agent.
|
||||
Loading…
Reference in New Issue
Block a user