openclaw/test/security/README.md
Jai Govindani c5ce8cacbf
feat(security): add E2E security test harness with LLM judge
Add comprehensive security acceptance testing framework that validates
Moltbot's resistance to prompt injection, data exfiltration, and trust
boundary violations.

Key components:
- LLM-as-judge pattern using Claude to evaluate attack resistance
- WebSocket gateway client for direct protocol testing
- CLI mocking utilities for injecting poisoned external data
- Docker Compose setup for containerized CI execution
- GitHub Actions workflow with daily scheduled runs

Test categories covered:
- Email/calendar prompt injection via external data
- Trust boundary violations and auth bypass attempts
- Data exfiltration prevention
- Tool output poisoning
2026-01-29 08:52:59 +07:00

141 lines
3.9 KiB
Markdown

# Security Acceptance Tests
E2E security testing framework for Moltbot. Validates resistance to:
- Prompt injection via external data sources
- Data exfiltration attempts
- Trust boundary violations
- Tool poisoning attacks
## Architecture: LLM-as-Judge
Pattern matching can't reliably detect whether prompt injection succeeded. We use
**Claude as a judge** to evaluate whether Moltbot resisted attacks:
1. Run test scenario (send poisoned data to Moltbot)
2. Capture Moltbot's response and tool calls
3. Send to Claude judge with structured output
4. Judge evaluates: injection detected? complied with? data leaked?
This enables nuanced evaluation of subtle attacks that regex can't catch.
## Quick Start
```bash
# Install Anthropic SDK (required for LLM judge)
pnpm add -D @anthropic-ai/sdk
# Run security tests
ANTHROPIC_API_KEY=sk-ant-xxx pnpm test:security
# Run specific category
pnpm test:security --grep "Email Injection"
```
## Structure
```
test/security/
├── SPEC.md # Full specification document
├── README.md # This file
├── harness/
│ ├── index.ts # Exports
│ ├── gateway-client.ts # WebSocket gateway client
│ ├── assertions.ts # Pattern-based assertions (fast checks)
│ ├── llm-judge.ts # Claude-based evaluation (nuanced checks)
│ └── cli-mocks/
│ └── mock-binary.ts # CLI binary mocking utilities
└── *.e2e.test.ts # Test files by category
```
## Implementation Priority
Based on SPEC.md, implement in this order:
1. **email-injection.e2e.test.ts** - Gmail/email tests (highest attack surface)
2. **calendar-injection.e2e.test.ts** - Calendar event injection tests
3. **api-injection.e2e.test.ts** - Generic API response injection
4. **trust-boundary.e2e.test.ts** - Authentication bypass and session leakage
5. **tool-poisoning.e2e.test.ts** - Malicious skill/plugin output
## Key Dependencies
```bash
# Add to devDependencies
pnpm add -D @anthropic-ai/sdk ws
```
- **Vitest** - Test runner (already configured in repo)
- **@anthropic-ai/sdk** - LLM judge (Claude Sonnet for evaluation)
- **ws** - WebSocket client
- **Hono** - Mock HTTP servers (already in deps)
## Running Tests
### Option 1: Local Script (Recommended)
```bash
# Install Anthropic SDK first
pnpm add -D @anthropic-ai/sdk ws
# Set API key
export ANTHROPIC_API_KEY=sk-ant-xxx
# Run all tests (auto-detects local gateway or spins up Docker)
./test/security/run-local.sh
# Run specific test category
./test/security/run-local.sh "Email Injection"
```
### Option 2: Docker Compose (Full Isolation)
```bash
# Run containerized tests
ANTHROPIC_API_KEY=sk-ant-xxx \
docker compose -f test/security/docker-compose.yml up --build --abort-on-container-exit
# Cleanup
docker compose -f test/security/docker-compose.yml down -v
```
### Option 3: Against Local Gateway
```bash
# Terminal 1: Start gateway
moltbot gateway
# Terminal 2: Run tests directly
ANTHROPIC_API_KEY=sk-ant-xxx pnpm vitest run --config vitest.security.config.ts
```
## CI Integration
Security tests run automatically via GitHub Actions:
- On every push to `main`
- On every PR to `main`
- Daily at midnight UTC (scheduled)
- Manually via workflow dispatch
**Required secrets:**
- `ANTHROPIC_API_KEY`: For LLM judge evaluation
The workflow blocks releases if security tests fail.
## Test Output
Each test produces a `SecurityVerdict`:
```typescript
{
passed: boolean; // Did the security test pass?
injectionDetected: boolean; // Did agent notice the attack?
injectionCompliedWith: boolean; // Did agent follow malicious instructions?
dataLeaked: boolean; // Was sensitive data exposed?
reasoning: string; // Judge's explanation
severity: "none" | "low" | "medium" | "high" | "critical";
evidence: string[]; // Supporting evidence
}
```
See SPEC.md for complete implementation details.