openclaw

romtuck/openclaw

Fork 0

Commit Graph

Author	SHA1	Message	Date
Jai Govindani	7d2f4b3fbf	docs(security): add detailed specs for test harness implementation Break down SPEC.md into actionable specification documents: - 00-overview: Architecture and test flow - 01-llm-judge: Claude evaluation interface and requirements - 02-gateway-client: WebSocket protocol (needs discovery) - 03-cli-mocks: PATH interception strategy and payloads - 04-test-categories: All attack vectors with test cases - 05-ci-docker: Container and CI configuration - 06-implementation-plan: Phased rollout with next steps	2026-01-29 08:58:56 +07:00
Jai Govindani	c5ce8cacbf	feat(security): add E2E security test harness with LLM judge Add comprehensive security acceptance testing framework that validates Moltbot's resistance to prompt injection, data exfiltration, and trust boundary violations. Key components: - LLM-as-judge pattern using Claude to evaluate attack resistance - WebSocket gateway client for direct protocol testing - CLI mocking utilities for injecting poisoned external data - Docker Compose setup for containerized CI execution - GitHub Actions workflow with daily scheduled runs Test categories covered: - Email/calendar prompt injection via external data - Trust boundary violations and auth bypass attempts - Data exfiltration prevention - Tool output poisoning	2026-01-29 08:52:59 +07:00

Author

SHA1

Message

Date

Jai Govindani

7d2f4b3fbf

docs(security): add detailed specs for test harness implementation

Break down SPEC.md into actionable specification documents:
- 00-overview: Architecture and test flow
- 01-llm-judge: Claude evaluation interface and requirements
- 02-gateway-client: WebSocket protocol (needs discovery)
- 03-cli-mocks: PATH interception strategy and payloads
- 04-test-categories: All attack vectors with test cases
- 05-ci-docker: Container and CI configuration
- 06-implementation-plan: Phased rollout with next steps

2026-01-29 08:58:56 +07:00

Jai Govindani

c5ce8cacbf

feat(security): add E2E security test harness with LLM judge

Add comprehensive security acceptance testing framework that validates
Moltbot's resistance to prompt injection, data exfiltration, and trust
boundary violations.

Key components:
- LLM-as-judge pattern using Claude to evaluate attack resistance
- WebSocket gateway client for direct protocol testing
- CLI mocking utilities for injecting poisoned external data
- Docker Compose setup for containerized CI execution
- GitHub Actions workflow with daily scheduled runs

Test categories covered:
- Email/calendar prompt injection via external data
- Trust boundary violations and auth bypass attempts
- Data exfiltration prevention
- Tool output poisoning

2026-01-29 08:52:59 +07:00

2 Commits