Ulrich Diedrichsen e69eccb4b1 docs: enhance PR description with motivation and problem statement

2026-01-30 11:23:04 +01:00

14 KiB

Raw Blame History

Security Shield Implementation

Motivation

OpenClaw is increasingly deployed on internet-facing VPS servers to provide remote access to AI agents via messaging platforms (Telegram, Discord, Slack, WhatsApp, Signal). These deployments are exposed to common internet threats:

Brute force attacks attempting to guess authentication tokens
Denial of Service (DoS) attacks overwhelming the gateway with connection/request floods
Intrusion attempts exploiting vulnerabilities (SSRF, path traversal, port scanning)
Unauthorized access from malicious IPs or botnets

Currently, OpenClaw has basic authentication but lacks:

Rate limiting to slow down attackers
Intrusion detection to identify attack patterns
Automated blocking of malicious IPs
Security event logging for audit trails
Real-time alerting when security incidents occur

This leaves VPS deployments vulnerable and operators blind to ongoing attacks. Users running OpenClaw on exposed servers need production-grade security controls without the complexity of external tools like fail2ban, Redis, or manual firewall management.

Problem

For VPS operators:

No protection against brute force attacks - Attackers can attempt unlimited authentication guesses, potentially discovering tokens through timing attacks or credential stuffing
No DoS protection - A single malicious actor can exhaust server resources with connection/request floods
No visibility into security events - Operators don't know when they're under attack or which IPs are malicious
Manual firewall management - Blocking IPs requires manual iptables/ufw commands and doesn't persist across restarts
No real-time alerting - Operators discover attacks only by noticing performance degradation or checking logs manually
No audit trail - Security-relevant events (failed auth, intrusion attempts) are mixed with application logs, making forensic analysis difficult

For the OpenClaw project:

Security features should be enabled by default (secure by default principle) but are currently opt-in or nonexistent
Existing openclaw security audit command only checks configuration, doesn't provide runtime protection
No standardized way to handle security events across different channels and connection types

Solution

This PR implements a comprehensive, zero-dependency security shield that provides enterprise-grade protection for OpenClaw deployments:

Core Design Principles

Opt-out security - Shield enabled by default for new deployments (users can disable if needed)
Zero external dependencies - No Redis, PostgreSQL, or external services required; uses in-memory LRU caches with bounded memory
Performance-first - <5ms latency overhead per request; async fire-and-forget for firewall/alerts
Fail-open by default - Errors in security checks don't block legitimate traffic
Comprehensive logging - Structured JSONL logs for audit trails and forensic analysis
Operator-friendly - CLI commands for management, Telegram alerts for real-time notifications

Architecture

HTTP/WS Request → Security Shield Middleware → Gateway Auth → Business Logic
                       ↓
                Rate Limiter (token bucket + LRU cache)
                       ↓
                Intrusion Detector (pattern matching)
                       ↓
                IP Manager (blocklist/allowlist + CIDR)
                       ↓
                Firewall Integration (iptables/ufw on Linux)
                       ↓
                Security Event Logger (/tmp/openclaw/security-*.jsonl)
                       ↓
                Alert Manager (Telegram/Webhook/Slack/Email)

Key Capabilities

Rate Limiting:

Per-IP: Auth attempts (5/5min), connections (10 concurrent), requests (100/min)
Per-device: Auth attempts (10/15min)
Per-sender: Pairing requests (3/hour)
Token bucket algorithm with automatic refill
LRU cache (10k entries max) prevents memory exhaustion

Intrusion Detection:

Brute force: 10 failed auth in 10min → auto-block
SSRF bypass attempts: 3 in 5min → alert
Path traversal: 5 in 5min → alert
Port scanning: 20 connection attempts in 10s → alert
Event aggregation with time-window analysis

IP Management:

Blocklist with configurable expiration (default 24h)
Allowlist with CIDR support (e.g., 100.64.0.0/10 for Tailscale)
Persistent storage (~/.openclaw/security/blocklist.json)
Automatic firewall integration (iptables/ufw on Linux)
Manual management via CLI: openclaw blocklist add/remove

Security Logging:

Structured JSONL format: /tmp/openclaw/security-YYYY-MM-DD.jsonl
Daily rotation (24h retention by default)
Categories: authentication, rate_limit, intrusion_attempt, network_access, pairing
Also exported to main logger for OTEL telemetry

Real-time Alerting:

Telegram Bot API integration (priority channel)
Webhook/Slack/Email support
Alert throttling (1 alert per trigger per 5min) prevents spam
Triggers: Critical events, failed auth spike (20 in 10min), IP blocked
Formatted messages with severity emojis and Markdown

Why This Approach?

Zero dependencies: Many security solutions require Redis (rate limiting), PostgreSQL (event storage), or fail2ban (intrusion detection). This implementation uses only Node.js built-ins and in-memory data structures, making it:

Easy to deploy (no additional services)
Low resource overhead (<50MB memory, <5ms latency)
Portable across Mac/Linux/BSD
No external service failures

Opt-out by default: Following the "secure by default" principle, new deployments automatically get protection. Existing deployments remain unchanged (backward compatible) but can opt-in via openclaw security enable.

Production-ready: The implementation uses battle-tested algorithms (token bucket for rate limiting, LRU cache for memory bounds) and defensive programming (fail-open, async fire-and-forget, comprehensive error handling).

Overview

This PR implements a comprehensive security shield for OpenClaw deployments on Mac/Linux VPS with:

Rate limiting to prevent brute force and DoS attacks
Intrusion detection with pattern-based attack recognition
IP blocklist/allowlist with automatic blocking and firewall integration
Centralized security logging with structured events
Real-time alerting via Telegram (with webhook/Slack/email support)
Enabled by default for new deployments (opt-out mode)

All security features are implemented without external dependencies (no Redis required), using in-memory LRU caches with bounded memory usage.

Implementation Details

Phase 1: Core Security Infrastructure

New Files:

src/security/token-bucket.ts - Token bucket algorithm for rate limiting
src/security/rate-limiter.ts - LRU-cached rate limiter with helper functions
src/security/ip-manager.ts - IP blocklist/allowlist management with CIDR support
src/security/intrusion-detector.ts - Attack pattern detection engine
src/security/shield.ts - Main security coordinator
src/security/middleware.ts - HTTP middleware integration
src/security/events/schema.ts - SecurityEvent type definitions
src/security/events/logger.ts - Security-specific event logger
src/security/events/aggregator.ts - Event aggregation for time-window detection
src/config/types.security.ts - Security configuration types
Comprehensive unit tests for all modules

Key Features:

Rate limits: Per-IP auth (5/5min), connections (10 concurrent), requests (100/min)
Auto-block: 10 failed auth in 10min → 24h block
Attack patterns: Brute force, SSRF bypass, path traversal, port scanning
Whitelist: Tailscale IPs (100.64.0.0/10), localhost always exempt
Memory-bounded: 10k entry LRU cache with auto-cleanup

Integration Points:

src/gateway/auth.ts - Rate limiting + failed auth logging for intrusion detection
src/gateway/server-http.ts - Webhook rate limiting
src/pairing/pairing-store.ts - Pairing request rate limiting
src/config/schema.ts - Security configuration schema with opt-out defaults
src/config/defaults.ts - Default security configuration

Phase 2: Firewall Integration & Alerting

New Files:

src/security/firewall/manager.ts - Firewall integration coordinator
src/security/firewall/iptables.ts - iptables backend (Linux)
src/security/firewall/ufw.ts - ufw backend (Linux)
src/security/alerting/manager.ts - Alert system coordinator
src/security/alerting/types.ts - Alert type definitions
src/security/alerting/telegram.ts - Telegram Bot API integration
src/security/alerting/webhook.ts - Generic webhook support
src/security/alerting/slack.ts - Slack incoming webhook
src/security/alerting/email.ts - SMTP email alerts

Key Features:

Firewall integration: Auto-applies iptables/ufw rules when blocking IPs (Linux only)
Telegram alerts: Formatted messages with severity emojis, Markdown support
Alert throttling: Prevents spam (max 1 alert per trigger per 5min)
Alert triggers: Critical events, failed auth spike, IP blocked
Async fire-and-forget: Firewall/alert operations don't block request handling

Integration:

src/security/ip-manager.ts - Calls firewall manager when blocking/unblocking
src/security/events/logger.ts - Triggers alert manager on security events
src/gateway/server.impl.ts - Initialize firewall and alert managers on startup

Phase 3: CLI Commands & Documentation

New Files:

src/cli/security-cli.ts - Security management commands (extended)
src/cli/parse-duration.ts - Duration parser for CLI options
docs/security/security-shield.md - Comprehensive security guide (465 lines)
docs/security/alerting.md - Alerting setup guide with Telegram focus (342 lines)

CLI Commands:

openclaw security enable/disable/status
openclaw security audit [--deep] [--fix]
openclaw security logs [-f] [--severity critical|warn|info]
openclaw blocklist list/add/remove
openclaw allowlist list/add/remove

Documentation:

Quick start guide with examples
Configuration reference
Telegram bot setup walkthrough
Best practices and troubleshooting
Security checklist for VPS deployments

Testing

Unit Tests:

Token bucket algorithm tests
Rate limiter tests with LRU cache verification
IP manager tests with CIDR support
Intrusion detector tests with time-window aggregation
Firewall manager tests (mocked)
Telegram alerting tests (mocked)

Test Coverage:

All core security modules have comprehensive unit tests
Tests verify rate limiting, auto-blocking, allowlist exemption
Tests verify CIDR matching (e.g., 100.64.0.0/10 for Tailscale)
Tests verify event aggregation for attack detection

Manual Testing Performed:

Verified rate limiting blocks after threshold
Verified failed auth triggers auto-block
Verified allowlist exempts IPs from blocking
Verified security events logged to /tmp/openclaw/security-YYYY-MM-DD.jsonl
Verified CLI commands (status, logs, blocklist, allowlist)

Breaking Changes

None. All features are additive and backward-compatible.

New deployments: Security shield enabled by default
Existing deployments: Security shield remains disabled unless explicitly enabled
Performance impact: <5ms per request (negligible)
Memory impact: ~10MB for rate limiter cache (bounded)

Configuration Changes

New Configuration Section:

security:
  shield:
    enabled: true  # DEFAULT: true for new configs (opt-out mode)
    rateLimiting:
      enabled: true
      perIp:
        authAttempts: { max: 5, windowMs: 300000 }
        connections: { max: 10, windowMs: 60000 }
        requests: { max: 100, windowMs: 60000 }
    intrusionDetection:
      enabled: true
      patterns:
        bruteForce: { threshold: 10, windowMs: 600000 }
    ipManagement:
      autoBlock:
        enabled: true
        durationMs: 86400000  # 24 hours
      allowlist:
        - "100.64.0.0/10"  # Tailscale CGNAT (auto-added)
      firewall:
        enabled: true  # Linux only
        backend: "iptables"  # or "ufw"
  alerting:
    enabled: false  # Disabled by default (requires channel config)
    channels:
      telegram:
        enabled: false
        botToken: "${TELEGRAM_BOT_TOKEN}"
        chatId: "${TELEGRAM_CHAT_ID}"

Migration Guide

For existing deployments:

# 1. Update OpenClaw
npm install -g openclaw@latest

# 2. Run security audit
openclaw security audit --deep

# 3. Enable security shield
openclaw security enable

# 4. (Optional) Configure Telegram alerts
openclaw configure security.alerting.channels.telegram.botToken
openclaw configure security.alerting.channels.telegram.chatId
openclaw configure security.alerting.enabled true

# 5. Restart gateway
openclaw gateway restart

# 6. Monitor security logs
openclaw security logs --follow

Documentation

New Documentation:

docs/security/security-shield.md - Comprehensive security guide
docs/security/alerting.md - Alerting setup and configuration

Updated Documentation:

CHANGELOG.md - Added security shield entry

Future Enhancements

Potential future improvements (not in this PR):

Geolocation-based blocking (MaxMind GeoIP2)
Machine learning-based anomaly detection
Integration with external threat intelligence feeds
Support for Windows Firewall (currently Linux only)
Web UI for security dashboard and configuration

Checklist

Core security infrastructure implemented (Phase 1)
Firewall integration implemented (Phase 2)
Alerting system implemented (Phase 2)
CLI commands implemented (Phase 3)
Comprehensive documentation written
Unit tests added for all modules
Configuration schema updated with defaults
Gateway integration completed
Changelog entry added
No breaking changes
Backward compatible with existing deployments

Addresses user requirements for:

Rate limiting to prevent brute force attacks
DoS protection
Intrusion detection
Audit logging for security events
Real-time alerting (Telegram priority)
Firewall integration for VPS deployments
Opt-out security model (enabled by default)

14 KiB Raw Blame History

Security Shield Implementation

Motivation

Problem

Solution

Core Design Principles

Architecture

Key Capabilities

Why This Approach?

Overview

Implementation Details

Phase 1: Core Security Infrastructure

Phase 2: Firewall Integration & Alerting

Phase 3: CLI Commands & Documentation

Testing

Breaking Changes

Configuration Changes

Migration Guide

Documentation

Future Enhancements

Checklist

Related Issues

14 KiB

Raw Blame History