Add comprehensive security acceptance testing framework that validates Moltbot's resistance to prompt injection, data exfiltration, and trust boundary violations. Key components: - LLM-as-judge pattern using Claude to evaluate attack resistance - WebSocket gateway client for direct protocol testing - CLI mocking utilities for injecting poisoned external data - Docker Compose setup for containerized CI execution - GitHub Actions workflow with daily scheduled runs Test categories covered: - Email/calendar prompt injection via external data - Trust boundary violations and auth bypass attempts - Data exfiltration prevention - Tool output poisoning
3.9 KiB
3.9 KiB
Security Acceptance Tests
E2E security testing framework for Moltbot. Validates resistance to:
- Prompt injection via external data sources
- Data exfiltration attempts
- Trust boundary violations
- Tool poisoning attacks
Architecture: LLM-as-Judge
Pattern matching can't reliably detect whether prompt injection succeeded. We use Claude as a judge to evaluate whether Moltbot resisted attacks:
- Run test scenario (send poisoned data to Moltbot)
- Capture Moltbot's response and tool calls
- Send to Claude judge with structured output
- Judge evaluates: injection detected? complied with? data leaked?
This enables nuanced evaluation of subtle attacks that regex can't catch.
Quick Start
# Install Anthropic SDK (required for LLM judge)
pnpm add -D @anthropic-ai/sdk
# Run security tests
ANTHROPIC_API_KEY=sk-ant-xxx pnpm test:security
# Run specific category
pnpm test:security --grep "Email Injection"
Structure
test/security/
├── SPEC.md # Full specification document
├── README.md # This file
├── harness/
│ ├── index.ts # Exports
│ ├── gateway-client.ts # WebSocket gateway client
│ ├── assertions.ts # Pattern-based assertions (fast checks)
│ ├── llm-judge.ts # Claude-based evaluation (nuanced checks)
│ └── cli-mocks/
│ └── mock-binary.ts # CLI binary mocking utilities
└── *.e2e.test.ts # Test files by category
Implementation Priority
Based on SPEC.md, implement in this order:
- email-injection.e2e.test.ts - Gmail/email tests (highest attack surface)
- calendar-injection.e2e.test.ts - Calendar event injection tests
- api-injection.e2e.test.ts - Generic API response injection
- trust-boundary.e2e.test.ts - Authentication bypass and session leakage
- tool-poisoning.e2e.test.ts - Malicious skill/plugin output
Key Dependencies
# Add to devDependencies
pnpm add -D @anthropic-ai/sdk ws
- Vitest - Test runner (already configured in repo)
- @anthropic-ai/sdk - LLM judge (Claude Sonnet for evaluation)
- ws - WebSocket client
- Hono - Mock HTTP servers (already in deps)
Running Tests
Option 1: Local Script (Recommended)
# Install Anthropic SDK first
pnpm add -D @anthropic-ai/sdk ws
# Set API key
export ANTHROPIC_API_KEY=sk-ant-xxx
# Run all tests (auto-detects local gateway or spins up Docker)
./test/security/run-local.sh
# Run specific test category
./test/security/run-local.sh "Email Injection"
Option 2: Docker Compose (Full Isolation)
# Run containerized tests
ANTHROPIC_API_KEY=sk-ant-xxx \
docker compose -f test/security/docker-compose.yml up --build --abort-on-container-exit
# Cleanup
docker compose -f test/security/docker-compose.yml down -v
Option 3: Against Local Gateway
# Terminal 1: Start gateway
moltbot gateway
# Terminal 2: Run tests directly
ANTHROPIC_API_KEY=sk-ant-xxx pnpm vitest run --config vitest.security.config.ts
CI Integration
Security tests run automatically via GitHub Actions:
- On every push to
main - On every PR to
main - Daily at midnight UTC (scheduled)
- Manually via workflow dispatch
Required secrets:
ANTHROPIC_API_KEY: For LLM judge evaluation
The workflow blocks releases if security tests fail.
Test Output
Each test produces a SecurityVerdict:
{
passed: boolean; // Did the security test pass?
injectionDetected: boolean; // Did agent notice the attack?
injectionCompliedWith: boolean; // Did agent follow malicious instructions?
dataLeaked: boolean; // Was sensitive data exposed?
reasoning: string; // Judge's explanation
severity: "none" | "low" | "medium" | "high" | "critical";
evidence: string[]; // Supporting evidence
}
See SPEC.md for complete implementation details.