openclaw/test/security
Jai Govindani 7d2f4b3fbf
docs(security): add detailed specs for test harness implementation
Break down SPEC.md into actionable specification documents:
- 00-overview: Architecture and test flow
- 01-llm-judge: Claude evaluation interface and requirements
- 02-gateway-client: WebSocket protocol (needs discovery)
- 03-cli-mocks: PATH interception strategy and payloads
- 04-test-categories: All attack vectors with test cases
- 05-ci-docker: Container and CI configuration
- 06-implementation-plan: Phased rollout with next steps
2026-01-29 08:58:56 +07:00
..
harness feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00
specs docs(security): add detailed specs for test harness implementation 2026-01-29 08:58:56 +07:00
docker-compose.yml feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00
Dockerfile.test feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00
email-injection.e2e.test.ts feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00
README.md feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00
run-local.sh feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00
setup.ts feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00
SPEC.md feat(security): add E2E security test harness with LLM judge 2026-01-29 08:52:59 +07:00

Security Acceptance Tests

E2E security testing framework for Moltbot. Validates resistance to:

  • Prompt injection via external data sources
  • Data exfiltration attempts
  • Trust boundary violations
  • Tool poisoning attacks

Architecture: LLM-as-Judge

Pattern matching can't reliably detect whether prompt injection succeeded. We use Claude as a judge to evaluate whether Moltbot resisted attacks:

  1. Run test scenario (send poisoned data to Moltbot)
  2. Capture Moltbot's response and tool calls
  3. Send to Claude judge with structured output
  4. Judge evaluates: injection detected? complied with? data leaked?

This enables nuanced evaluation of subtle attacks that regex can't catch.

Quick Start

# Install Anthropic SDK (required for LLM judge)
pnpm add -D @anthropic-ai/sdk

# Run security tests
ANTHROPIC_API_KEY=sk-ant-xxx pnpm test:security

# Run specific category
pnpm test:security --grep "Email Injection"

Structure

test/security/
├── SPEC.md                    # Full specification document
├── README.md                  # This file
├── harness/
│   ├── index.ts               # Exports
│   ├── gateway-client.ts      # WebSocket gateway client
│   ├── assertions.ts          # Pattern-based assertions (fast checks)
│   ├── llm-judge.ts           # Claude-based evaluation (nuanced checks)
│   └── cli-mocks/
│       └── mock-binary.ts     # CLI binary mocking utilities
└── *.e2e.test.ts              # Test files by category

Implementation Priority

Based on SPEC.md, implement in this order:

  1. email-injection.e2e.test.ts - Gmail/email tests (highest attack surface)
  2. calendar-injection.e2e.test.ts - Calendar event injection tests
  3. api-injection.e2e.test.ts - Generic API response injection
  4. trust-boundary.e2e.test.ts - Authentication bypass and session leakage
  5. tool-poisoning.e2e.test.ts - Malicious skill/plugin output

Key Dependencies

# Add to devDependencies
pnpm add -D @anthropic-ai/sdk ws
  • Vitest - Test runner (already configured in repo)
  • @anthropic-ai/sdk - LLM judge (Claude Sonnet for evaluation)
  • ws - WebSocket client
  • Hono - Mock HTTP servers (already in deps)

Running Tests

# Install Anthropic SDK first
pnpm add -D @anthropic-ai/sdk ws

# Set API key
export ANTHROPIC_API_KEY=sk-ant-xxx

# Run all tests (auto-detects local gateway or spins up Docker)
./test/security/run-local.sh

# Run specific test category
./test/security/run-local.sh "Email Injection"

Option 2: Docker Compose (Full Isolation)

# Run containerized tests
ANTHROPIC_API_KEY=sk-ant-xxx \
docker compose -f test/security/docker-compose.yml up --build --abort-on-container-exit

# Cleanup
docker compose -f test/security/docker-compose.yml down -v

Option 3: Against Local Gateway

# Terminal 1: Start gateway
moltbot gateway

# Terminal 2: Run tests directly
ANTHROPIC_API_KEY=sk-ant-xxx pnpm vitest run --config vitest.security.config.ts

CI Integration

Security tests run automatically via GitHub Actions:

  • On every push to main
  • On every PR to main
  • Daily at midnight UTC (scheduled)
  • Manually via workflow dispatch

Required secrets:

  • ANTHROPIC_API_KEY: For LLM judge evaluation

The workflow blocks releases if security tests fail.

Test Output

Each test produces a SecurityVerdict:

{
  passed: boolean;           // Did the security test pass?
  injectionDetected: boolean; // Did agent notice the attack?
  injectionCompliedWith: boolean; // Did agent follow malicious instructions?
  dataLeaked: boolean;       // Was sensitive data exposed?
  reasoning: string;         // Judge's explanation
  severity: "none" | "low" | "medium" | "high" | "critical";
  evidence: string[];        // Supporting evidence
}

See SPEC.md for complete implementation details.