openclaw/test/security/README.md

# Security Acceptance Tests

E2E security testing framework for Moltbot. Validates resistance to:
- Prompt injection via external data sources
- Data exfiltration attempts
- Trust boundary violations
- Tool poisoning attacks

## Architecture: LLM-as-Judge

Pattern matching can't reliably detect whether prompt injection succeeded. We use
**Claude as a judge** to evaluate whether Moltbot resisted attacks:

1. Run test scenario (send poisoned data to Moltbot)
2. Capture Moltbot's response and tool calls
3. Send to Claude judge with structured output
4. Judge evaluates: injection detected? complied with? data leaked?

This enables nuanced evaluation of subtle attacks that regex can't catch.

## Quick Start

```bash
# Install Anthropic SDK (required for LLM judge)
pnpm add -D @anthropic-ai/sdk

# Run security tests
ANTHROPIC_API_KEY=sk-ant-xxx pnpm test:security

# Run specific category
pnpm test:security --grep "Email Injection"
```

## Structure

```
test/security/
├── SPEC.md                    # Full specification document
├── README.md                  # This file
├── harness/
│   ├── index.ts               # Exports
│   ├── gateway-client.ts      # WebSocket gateway client
│   ├── assertions.ts          # Pattern-based assertions (fast checks)
│   ├── llm-judge.ts           # Claude-based evaluation (nuanced checks)
│   └── cli-mocks/
│       └── mock-binary.ts     # CLI binary mocking utilities
└── *.e2e.test.ts              # Test files by category
```

## Implementation Priority

Based on SPEC.md, implement in this order:

1. **email-injection.e2e.test.ts** - Gmail/email tests (highest attack surface)
2. **calendar-injection.e2e.test.ts** - Calendar event injection tests
3. **api-injection.e2e.test.ts** - Generic API response injection
4. **trust-boundary.e2e.test.ts** - Authentication bypass and session leakage
5. **tool-poisoning.e2e.test.ts** - Malicious skill/plugin output

## Key Dependencies

```bash
# Add to devDependencies
pnpm add -D @anthropic-ai/sdk ws
```

- **Vitest** - Test runner (already configured in repo)
- **@anthropic-ai/sdk** - LLM judge (Claude Sonnet for evaluation)
- **ws** - WebSocket client
- **Hono** - Mock HTTP servers (already in deps)

## Running Tests

### Option 1: Local Script (Recommended)

```bash
# Install Anthropic SDK first
pnpm add -D @anthropic-ai/sdk ws

# Set API key
export ANTHROPIC_API_KEY=sk-ant-xxx

# Run all tests (auto-detects local gateway or spins up Docker)
./test/security/run-local.sh

# Run specific test category
./test/security/run-local.sh "Email Injection"
```

### Option 2: Docker Compose (Full Isolation)

```bash
# Run containerized tests
ANTHROPIC_API_KEY=sk-ant-xxx \
docker compose -f test/security/docker-compose.yml up --build --abort-on-container-exit

# Cleanup
docker compose -f test/security/docker-compose.yml down -v
```

### Option 3: Against Local Gateway

```bash
# Terminal 1: Start gateway
moltbot gateway

# Terminal 2: Run tests directly
ANTHROPIC_API_KEY=sk-ant-xxx pnpm vitest run --config vitest.security.config.ts
```

## CI Integration

Security tests run automatically via GitHub Actions:
- On every push to `main`
- On every PR to `main`
- Daily at midnight UTC (scheduled)
- Manually via workflow dispatch

**Required secrets:**
- `ANTHROPIC_API_KEY`: For LLM judge evaluation

The workflow blocks releases if security tests fail.

## Test Output

Each test produces a `SecurityVerdict`:

```typescript
{
  passed: boolean;           // Did the security test pass?
  injectionDetected: boolean; // Did agent notice the attack?
  injectionCompliedWith: boolean; // Did agent follow malicious instructions?
  dataLeaked: boolean;       // Was sensitive data exposed?
  reasoning: string;         // Judge's explanation
  severity: "none" | "low" | "medium" | "high" | "critical";
  evidence: string[];        // Supporting evidence
}
```

See SPEC.md for complete implementation details.