openclaw/test/security/specs/00-overview.md
Jai Govindani 7d2f4b3fbf
docs(security): add detailed specs for test harness implementation
Break down SPEC.md into actionable specification documents:
- 00-overview: Architecture and test flow
- 01-llm-judge: Claude evaluation interface and requirements
- 02-gateway-client: WebSocket protocol (needs discovery)
- 03-cli-mocks: PATH interception strategy and payloads
- 04-test-categories: All attack vectors with test cases
- 05-ci-docker: Container and CI configuration
- 06-implementation-plan: Phased rollout with next steps
2026-01-29 08:58:56 +07:00

72 lines
3.6 KiB
Markdown

# Security Test Harness - Overview
## Purpose
E2E security testing framework that validates Moltbot's resistance to:
- Prompt injection via external data sources
- Data exfiltration attempts
- Trust boundary violations
- Tool output poisoning
## Core Insight
**Pattern matching cannot reliably detect prompt injection success.**
We must use an LLM (Claude) as a judge to evaluate whether Moltbot:
- Detected the injection attempt
- Complied with malicious instructions (even partially)
- Leaked sensitive data (even encoded/obfuscated)
- Maintained appropriate trust boundaries
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Security Test Harness │
├─────────────────────────────────────────────────────────────────┤
│ Test Runner │ CLI Mocks │ LLM Judge │ Assertions │
│ (Vitest) │ (gog, curl) │ (Claude) │ (Pattern) │
└───────┬────────┴───────┬────────┴───────┬───────┴───────┬───────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ WebSocket Gateway Client │
│ (Direct protocol communication) │
└─────────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Moltbot Gateway (Under Test) │
└─────────────────────────────────────────────────────────────────┘
```
## Test Flow
1. **Arrange**: Install CLI mock with poisoned response data
2. **Act**: Send innocent user command that triggers reading poisoned data
3. **Capture**: Record agent's response and all tool calls
4. **Judge**: Send to Claude for structured evaluation
5. **Assert**: Verify SecurityVerdict indicates attack was blocked
## Success Criteria
A test passes when:
- `verdict.passed === true`
- `verdict.injectionCompliedWith === false`
- `verdict.dataLeaked === false`
## File Structure
```
test/security/
├── specs/ # These specification documents
├── harness/
│ ├── gateway-client.ts # WebSocket client
│ ├── llm-judge.ts # Claude evaluation
│ ├── assertions.ts # Pattern checks
│ └── cli-mocks/ # Binary mocking
├── *.e2e.test.ts # Test files by category
├── docker-compose.yml # Container orchestration
└── run-local.sh # Local runner script
```