diff --git a/docs/security/guardrails.md b/docs/security/guardrails.md index 147b9142f..6b89aa1cd 100644 --- a/docs/security/guardrails.md +++ b/docs/security/guardrails.md @@ -1,92 +1,116 @@ --- title: Guardrails -summary: Input/output validation and tool call security with @sentinelseed/moltbot. +summary: AI safety guardrails with @sentinelseed/moltbot. permalink: /security/guardrails/ --- # Guardrails -The [@sentinelseed/moltbot](https://www.npmjs.com/package/@sentinelseed/moltbot) package provides security guardrails for Moltbot, including prompt injection detection, tool call validation, and credential leak prevention. +The [@sentinelseed/moltbot](https://www.npmjs.com/package/@sentinelseed/moltbot) package provides AI safety guardrails for Moltbot, including real-time validation, data leak prevention, and threat detection. ```bash npm install @sentinelseed/moltbot ``` -## Usage +## Quick Start -The package exposes three main functions: `validateInput`, `validateToolCall`, and `validateOutput`. Each returns a result object with a `blocked` boolean and, when blocked, a `reason` string explaining why. +Add to your Moltbot config: -**Input validation** checks user messages before they reach the agent. It detects prompt injection attempts, jailbreak patterns, and encoded payloads (base64, hex). - -```ts -import { validateInput } from '@sentinelseed/moltbot'; - -const result = await validateInput(userMessage); -if (result.blocked) { - // handle blocked input +```json +{ + "plugins": { + "sentinel": { + "level": "watch" + } + } } ``` -**Tool call validation** inspects tool invocations before execution. It blocks dangerous shell commands (rm -rf, format), SQL injection patterns, path traversal attempts, and command injection via shell metacharacters. +## Protection Levels + +| Level | Blocking | Alerting | Best For | +|-------|----------|----------|----------| +| `off` | None | None | Disable Sentinel | +| `watch` | None | All threats | Daily use, full visibility | +| `guard` | Critical | High+ threats | Sensitive data environments | +| `shield` | Maximum | All threats | High-security workflows | + +The default `watch` mode provides full monitoring with zero blocking. Higher levels add protection you can always bypass when needed. + +## Hook Integration + +Sentinel provides a hook factory that integrates with Moltbot's hook system: ```ts -import { validateToolCall } from '@sentinelseed/moltbot'; +import { createSentinelHooks } from '@sentinelseed/moltbot'; -const result = await validateToolCall({ - name: 'shell', - arguments: { command: 'rm -rf /' } -}); -``` - -**Output validation** scans agent responses for leaked credentials. It catches API keys (OpenAI, GitHub, AWS), passwords, private keys (SSH, PGP), and tokens (JWT, bearer). - -```ts -import { validateOutput } from '@sentinelseed/moltbot'; - -const result = await validateOutput(aiResponse); -``` - -## Hook integration - -The recommended approach is to wire validation into Moltbot's hook system. The example below validates both incoming messages and tool calls before they execute. - -```ts -// hooks/sentinel-guard/handler.ts -import { validateInput, validateToolCall } from '@sentinelseed/moltbot'; - -export default { - 'message:before': async (ctx) => { - const result = await validateInput(ctx.message.text); - if (result.blocked) { - return { abort: true, reason: result.reason }; - } - }, - - 'tool:before': async (ctx) => { - const result = await validateToolCall(ctx.tool); - if (result.blocked) { - return { abort: true, reason: result.reason }; - } +const hooks = createSentinelHooks({ + level: 'guard', + alerts: { + enabled: true, + webhook: 'https://your-webhook.com/sentinel' } +}); + +export const moltbot_hooks = { + message_received: hooks.messageReceived, + before_agent_start: hooks.beforeAgentStart, + message_sending: hooks.messageSending, + before_tool_call: hooks.beforeToolCall, + agent_end: hooks.agentEnd, }; ``` +## Validators + +For advanced use cases, validators can be used directly: + +```ts +import { validateOutput, validateTool, analyzeInput, getLevelConfig } from '@sentinelseed/moltbot'; + +const levelConfig = getLevelConfig('guard'); + +const outputResult = await validateOutput(content, levelConfig); +if (outputResult.shouldBlock) { + console.log('Blocked:', outputResult.issues); +} + +const toolResult = await validateTool('bash', { command: 'ls' }, levelConfig); +const inputResult = await analyzeInput(userMessage); +``` + +## Escape Hatches + +When you need to bypass protection: + +```bash +/sentinel pause 5m # Pause for 5 minutes +/sentinel allow-once # Allow next action +/sentinel trust bash # Trust a tool for the session +/sentinel resume # Resume protection +``` + ## Configuration -You can customize validation behavior by creating a `sentinel.config.ts` in your workspace. The config accepts custom patterns for dangerous commands and credentials. - -```ts -import { defineConfig } from '@sentinelseed/moltbot'; - -export default defineConfig({ - blockDangerousCommands: true, - dangerousPatterns: [/rm\s+-rf/, /DROP\s+TABLE/i], - credentialPatterns: [/sk-[a-zA-Z0-9]{48}/, /ghp_[a-zA-Z0-9]{36}/], -}); +```json +{ + "plugins": { + "sentinel": { + "level": "guard", + "alerts": { + "enabled": true, + "webhook": "https://your-webhook.com/sentinel", + "minSeverity": "high" + }, + "ignorePatterns": ["MY_SAFE_TOKEN"], + "logLevel": "warn" + } + } +} ``` -All validation runs locally without external API calls. Typical latency is 2-5ms per validation call. +All validation runs locally without external API calls. ## Links -See the [npm package](https://www.npmjs.com/package/@sentinelseed/moltbot) for installation details, the [source repository](https://github.com/sentinel-seed/sentinel) for implementation, and the [Sentinel documentation](https://sentinelseed.dev/docs/moltbot) for additional examples. +See the [npm package](https://www.npmjs.com/package/@sentinelseed/moltbot) for installation details, the [source repository](https://github.com/sentinel-seed/sentinel) for implementation, and the [Sentinel documentation](https://sentinelseed.dev/docs/integrations/moltbot) for additional examples.