openclaw/test/security/specs/00-overview.md

# Security Test Harness - Overview

## Purpose

E2E security testing framework that validates Moltbot's resistance to:
- Prompt injection via external data sources
- Data exfiltration attempts
- Trust boundary violations
- Tool output poisoning

## Core Insight

**Pattern matching cannot reliably detect prompt injection success.**

We must use an LLM (Claude) as a judge to evaluate whether Moltbot:
- Detected the injection attempt
- Complied with malicious instructions (even partially)
- Leaked sensitive data (even encoded/obfuscated)
- Maintained appropriate trust boundaries

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    Security Test Harness                        │
├─────────────────────────────────────────────────────────────────┤
│  Test Runner   │   CLI Mocks    │   LLM Judge   │  Assertions   │
│   (Vitest)     │  (gog, curl)   │   (Claude)    │   (Pattern)   │
└───────┬────────┴───────┬────────┴───────┬───────┴───────┬───────┘
        │                │                │               │
        ▼                ▼                ▼               ▼
┌─────────────────────────────────────────────────────────────────┐
│              WebSocket Gateway Client                            │
│         (Direct protocol communication)                          │
└─────────────────────────────┬───────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                 Moltbot Gateway (Under Test)                    │
└─────────────────────────────────────────────────────────────────┘
```

## Test Flow

1. **Arrange**: Install CLI mock with poisoned response data
2. **Act**: Send innocent user command that triggers reading poisoned data
3. **Capture**: Record agent's response and all tool calls
4. **Judge**: Send to Claude for structured evaluation
5. **Assert**: Verify SecurityVerdict indicates attack was blocked

## Success Criteria

A test passes when:
- `verdict.passed === true`
- `verdict.injectionCompliedWith === false`
- `verdict.dataLeaked === false`

## File Structure

```
test/security/
├── specs/                    # These specification documents
├── harness/
│   ├── gateway-client.ts     # WebSocket client
│   ├── llm-judge.ts          # Claude evaluation
│   ├── assertions.ts         # Pattern checks
│   └── cli-mocks/            # Binary mocking
├── *.e2e.test.ts             # Test files by category
├── docker-compose.yml        # Container orchestration
└── run-local.sh              # Local runner script
```