docs: add browser architecture documentation

This commit is contained in:
jayeonsoft 2026-01-27 16:45:11 -05:00
parent 2889d6a141
commit 99ad7f455a

View File

@ -0,0 +1,104 @@
# Clawdbot Browser Architecture
This document outlines the architecture of the Clawdbot browser automation system, detailing how agent commands are translated into browser actions via a client-server model using Playwright and Chrome DevTools Protocol (CDP).
## Workflow Diagram
The following sequence diagram illustrates the end-to-end flow from an agent command to a browser action.
```mermaid
sequenceDiagram
participant Agent
participant Client as BrowserClient (HTTP)
participant Server as BrowserServer (Express)
participant Context as ProfileContext
participant PW as PlaywrightSession
participant Chrome as Chrome Instance
Note over Agent, Chrome: Opening a new tab (Example)
Agent->>Client: browser.open(url)
Client->>Server: POST /tabs/open?profile=clawd
Server->>Context: ctx.forProfile("clawd").openTab(url)
alt Remote/Persistent Profile
Context->>PW: createPageViaPlaywright(cdpUrl, url)
PW->>Chrome: chromium.connectOverCDP()
PW->>Chrome: context.newPage() & page.goto()
Chrome-->>PW: Target Created
PW-->>Context: { targetId, title, url }
else Local/Direct CDP
Context->>Chrome: PUT /json/new?url=...
Chrome-->>Context: { id: targetId }
end
Context->>Context: Update lastTargetId
Context-->>Server: BrowserTab
Server-->>Client: 200 OK (BrowserTab)
Client-->>Agent: Result { targetId: "..." }
Note over Agent, Chrome: Taking a snapshot (Example)
Agent->>Client: browser.snapshot(targetId)
Client->>Server: GET /snapshot?targetId=...
Server->>Context: ctx.forProfile("clawd").ensureTabAvailable(targetId)
Context-->>Server: BrowserTab
Server->>PW: getPageForTargetId(cdpUrl, targetId)
PW->>Chrome: Attach via CDP / Match URL
PW-->>Server: Playwright Page Object
Server->>PW: page.accessibility.snapshot() / html()
PW-->>Server: Snapshot Data
Server-->>Client: SnapshotResult
Client-->>Agent: Result
```
## Component Analysis
The architecture is split into four distinct layers, ensuring separation of concerns between the agent interface, the control plane, and the low-level automation.
### 1. Client Layer (`src/browser/client.ts`)
The **Client** provides a strongly-typed abstraction for agents to interact with the browser. It handles:
- **HTTP Communication**: Sends requests to the Browser Server (e.g., `/start`, `/snapshot`, `/tabs`).
- **Query Construction**: Serializes options like `profile`, `targetId`, and `format` into URL parameters.
- **Type Safety**: Ensures responses match expected interfaces (`BrowserTab`, `SnapshotResult`).
### 2. Server Layer (`src/browser/server.ts`)
The **Server** is the control plane. It acts as an Express-based HTTP server that:
- **Lifecycle Management**: Starts and stops the browser automation service.
- **State Management**: Maintains a global `BrowserServerState` containing active profiles and configuration.
- **Routing**: Delegates incoming HTTP requests to the appropriate handlers in `src/browser/routes/`.
### 3. Context & Profiles (`src/browser/server-context.ts`)
The **Context** layer manages the state and logic for individual browser profiles (e.g., "clawd" for automation, "extension" for user-driven sessions).
- **Profile Isolation**: Each profile (Default, Chrome, Edge) has its own `ProfileContext`.
- **Target Resolution**: Manages `lastTargetId` to provide continuity when an agent doesn't specify a target.
- **Abstraction**: Hides the difference between local loopback CDP interactions (via raw HTTP) and remote/persistent connections (via Playwright).
### 4. Automation Layer (`src/browser/pw-session.ts`)
The **Automation** layer (Playwright Session) manages the direct connection to the browser instance.
- **CDP Connection**: Uses `chromium.connectOverCDP` to maintain a persistent WebSocket connection to Chrome.
- **Page Resolution**: The critical `getPageForTargetId` function maps a CDP `targetId` to a Playwright `Page` object, enabling the use of Playwright's rich API (locators, snapshots) on specific tabs.
- **State Tracking**: Maintains `PageState` (console logs, network requests, errors) using WeakMaps attached to Playwright Page objects.
- **Resilience**: Includes fallbacks for finding pages by URL when CDP attachment fails (common with extension-based relays).
## Key Concepts
### Target IDs
A `targetId` is the unique identifier for a specific browser tab or window (CDP Target).
- **Origin**: Generated by Chrome/CDP.
- **Usage**: Agents use this ID to direct actions (click, type, snapshot) to a specific page.
- **Flow**: The Server returns a `targetId` upon tab creation or listing. The Agent must provide this ID for subsequent actions, or the system falls back to the `lastTargetId`.
### Profiles
Clawdbot supports multiple isolated browser profiles:
- **clawd**: A fully automated, headless (or headed) Chrome instance managed by the bot.
- **extension**: A relay mode that connects to a user's existing Chrome instance via the Clawdbot Browser Extension.
- **custom**: User-defined profiles with specific CDP endpoints.
### Error Handling
Errors are propagated up the stack:
1. **Playwright/CDP**: Low-level connection or timeout errors are caught in `pw-session.ts`.
2. **Context**: Logic errors (e.g., "tab not found", "ambiguous target") are handled in `server-context.ts`.
3. **Server**: Maps exceptions to HTTP status codes (404 for missing tabs, 409 for ambiguous requests).
4. **Client**: Throws typed errors that the Agent can catch and handle (e.g., retrying a snapshot).