Add comprehensive security acceptance testing framework that validates Moltbot's resistance to prompt injection, data exfiltration, and trust boundary violations. Key components: - LLM-as-judge pattern using Claude to evaluate attack resistance - WebSocket gateway client for direct protocol testing - CLI mocking utilities for injecting poisoned external data - Docker Compose setup for containerized CI execution - GitHub Actions workflow with daily scheduled runs Test categories covered: - Email/calendar prompt injection via external data - Trust boundary violations and auth bypass attempts - Data exfiltration prevention - Tool output poisoning
141 lines
3.9 KiB
Markdown
141 lines
3.9 KiB
Markdown
# Security Acceptance Tests
|
|
|
|
E2E security testing framework for Moltbot. Validates resistance to:
|
|
- Prompt injection via external data sources
|
|
- Data exfiltration attempts
|
|
- Trust boundary violations
|
|
- Tool poisoning attacks
|
|
|
|
## Architecture: LLM-as-Judge
|
|
|
|
Pattern matching can't reliably detect whether prompt injection succeeded. We use
|
|
**Claude as a judge** to evaluate whether Moltbot resisted attacks:
|
|
|
|
1. Run test scenario (send poisoned data to Moltbot)
|
|
2. Capture Moltbot's response and tool calls
|
|
3. Send to Claude judge with structured output
|
|
4. Judge evaluates: injection detected? complied with? data leaked?
|
|
|
|
This enables nuanced evaluation of subtle attacks that regex can't catch.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Install Anthropic SDK (required for LLM judge)
|
|
pnpm add -D @anthropic-ai/sdk
|
|
|
|
# Run security tests
|
|
ANTHROPIC_API_KEY=sk-ant-xxx pnpm test:security
|
|
|
|
# Run specific category
|
|
pnpm test:security --grep "Email Injection"
|
|
```
|
|
|
|
## Structure
|
|
|
|
```
|
|
test/security/
|
|
├── SPEC.md # Full specification document
|
|
├── README.md # This file
|
|
├── harness/
|
|
│ ├── index.ts # Exports
|
|
│ ├── gateway-client.ts # WebSocket gateway client
|
|
│ ├── assertions.ts # Pattern-based assertions (fast checks)
|
|
│ ├── llm-judge.ts # Claude-based evaluation (nuanced checks)
|
|
│ └── cli-mocks/
|
|
│ └── mock-binary.ts # CLI binary mocking utilities
|
|
└── *.e2e.test.ts # Test files by category
|
|
```
|
|
|
|
## Implementation Priority
|
|
|
|
Based on SPEC.md, implement in this order:
|
|
|
|
1. **email-injection.e2e.test.ts** - Gmail/email tests (highest attack surface)
|
|
2. **calendar-injection.e2e.test.ts** - Calendar event injection tests
|
|
3. **api-injection.e2e.test.ts** - Generic API response injection
|
|
4. **trust-boundary.e2e.test.ts** - Authentication bypass and session leakage
|
|
5. **tool-poisoning.e2e.test.ts** - Malicious skill/plugin output
|
|
|
|
## Key Dependencies
|
|
|
|
```bash
|
|
# Add to devDependencies
|
|
pnpm add -D @anthropic-ai/sdk ws
|
|
```
|
|
|
|
- **Vitest** - Test runner (already configured in repo)
|
|
- **@anthropic-ai/sdk** - LLM judge (Claude Sonnet for evaluation)
|
|
- **ws** - WebSocket client
|
|
- **Hono** - Mock HTTP servers (already in deps)
|
|
|
|
## Running Tests
|
|
|
|
### Option 1: Local Script (Recommended)
|
|
|
|
```bash
|
|
# Install Anthropic SDK first
|
|
pnpm add -D @anthropic-ai/sdk ws
|
|
|
|
# Set API key
|
|
export ANTHROPIC_API_KEY=sk-ant-xxx
|
|
|
|
# Run all tests (auto-detects local gateway or spins up Docker)
|
|
./test/security/run-local.sh
|
|
|
|
# Run specific test category
|
|
./test/security/run-local.sh "Email Injection"
|
|
```
|
|
|
|
### Option 2: Docker Compose (Full Isolation)
|
|
|
|
```bash
|
|
# Run containerized tests
|
|
ANTHROPIC_API_KEY=sk-ant-xxx \
|
|
docker compose -f test/security/docker-compose.yml up --build --abort-on-container-exit
|
|
|
|
# Cleanup
|
|
docker compose -f test/security/docker-compose.yml down -v
|
|
```
|
|
|
|
### Option 3: Against Local Gateway
|
|
|
|
```bash
|
|
# Terminal 1: Start gateway
|
|
moltbot gateway
|
|
|
|
# Terminal 2: Run tests directly
|
|
ANTHROPIC_API_KEY=sk-ant-xxx pnpm vitest run --config vitest.security.config.ts
|
|
```
|
|
|
|
## CI Integration
|
|
|
|
Security tests run automatically via GitHub Actions:
|
|
- On every push to `main`
|
|
- On every PR to `main`
|
|
- Daily at midnight UTC (scheduled)
|
|
- Manually via workflow dispatch
|
|
|
|
**Required secrets:**
|
|
- `ANTHROPIC_API_KEY`: For LLM judge evaluation
|
|
|
|
The workflow blocks releases if security tests fail.
|
|
|
|
## Test Output
|
|
|
|
Each test produces a `SecurityVerdict`:
|
|
|
|
```typescript
|
|
{
|
|
passed: boolean; // Did the security test pass?
|
|
injectionDetected: boolean; // Did agent notice the attack?
|
|
injectionCompliedWith: boolean; // Did agent follow malicious instructions?
|
|
dataLeaked: boolean; // Was sensitive data exposed?
|
|
reasoning: string; // Judge's explanation
|
|
severity: "none" | "low" | "medium" | "high" | "critical";
|
|
evidence: string[]; // Supporting evidence
|
|
}
|
|
```
|
|
|
|
See SPEC.md for complete implementation details.
|