fix: correct API references in guardrails docs
This commit is contained in:
parent
10ed53b6d8
commit
d7b0c62ddf
@ -1,92 +1,116 @@
|
|||||||
---
|
---
|
||||||
title: Guardrails
|
title: Guardrails
|
||||||
summary: Input/output validation and tool call security with @sentinelseed/moltbot.
|
summary: AI safety guardrails with @sentinelseed/moltbot.
|
||||||
permalink: /security/guardrails/
|
permalink: /security/guardrails/
|
||||||
---
|
---
|
||||||
|
|
||||||
# Guardrails
|
# Guardrails
|
||||||
|
|
||||||
The [@sentinelseed/moltbot](https://www.npmjs.com/package/@sentinelseed/moltbot) package provides security guardrails for Moltbot, including prompt injection detection, tool call validation, and credential leak prevention.
|
The [@sentinelseed/moltbot](https://www.npmjs.com/package/@sentinelseed/moltbot) package provides AI safety guardrails for Moltbot, including real-time validation, data leak prevention, and threat detection.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npm install @sentinelseed/moltbot
|
npm install @sentinelseed/moltbot
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage
|
## Quick Start
|
||||||
|
|
||||||
The package exposes three main functions: `validateInput`, `validateToolCall`, and `validateOutput`. Each returns a result object with a `blocked` boolean and, when blocked, a `reason` string explaining why.
|
Add to your Moltbot config:
|
||||||
|
|
||||||
**Input validation** checks user messages before they reach the agent. It detects prompt injection attempts, jailbreak patterns, and encoded payloads (base64, hex).
|
```json
|
||||||
|
{
|
||||||
```ts
|
"plugins": {
|
||||||
import { validateInput } from '@sentinelseed/moltbot';
|
"sentinel": {
|
||||||
|
"level": "watch"
|
||||||
const result = await validateInput(userMessage);
|
}
|
||||||
if (result.blocked) {
|
}
|
||||||
// handle blocked input
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Tool call validation** inspects tool invocations before execution. It blocks dangerous shell commands (rm -rf, format), SQL injection patterns, path traversal attempts, and command injection via shell metacharacters.
|
## Protection Levels
|
||||||
|
|
||||||
|
| Level | Blocking | Alerting | Best For |
|
||||||
|
|-------|----------|----------|----------|
|
||||||
|
| `off` | None | None | Disable Sentinel |
|
||||||
|
| `watch` | None | All threats | Daily use, full visibility |
|
||||||
|
| `guard` | Critical | High+ threats | Sensitive data environments |
|
||||||
|
| `shield` | Maximum | All threats | High-security workflows |
|
||||||
|
|
||||||
|
The default `watch` mode provides full monitoring with zero blocking. Higher levels add protection you can always bypass when needed.
|
||||||
|
|
||||||
|
## Hook Integration
|
||||||
|
|
||||||
|
Sentinel provides a hook factory that integrates with Moltbot's hook system:
|
||||||
|
|
||||||
```ts
|
```ts
|
||||||
import { validateToolCall } from '@sentinelseed/moltbot';
|
import { createSentinelHooks } from '@sentinelseed/moltbot';
|
||||||
|
|
||||||
const result = await validateToolCall({
|
const hooks = createSentinelHooks({
|
||||||
name: 'shell',
|
level: 'guard',
|
||||||
arguments: { command: 'rm -rf /' }
|
alerts: {
|
||||||
});
|
enabled: true,
|
||||||
```
|
webhook: 'https://your-webhook.com/sentinel'
|
||||||
|
|
||||||
**Output validation** scans agent responses for leaked credentials. It catches API keys (OpenAI, GitHub, AWS), passwords, private keys (SSH, PGP), and tokens (JWT, bearer).
|
|
||||||
|
|
||||||
```ts
|
|
||||||
import { validateOutput } from '@sentinelseed/moltbot';
|
|
||||||
|
|
||||||
const result = await validateOutput(aiResponse);
|
|
||||||
```
|
|
||||||
|
|
||||||
## Hook integration
|
|
||||||
|
|
||||||
The recommended approach is to wire validation into Moltbot's hook system. The example below validates both incoming messages and tool calls before they execute.
|
|
||||||
|
|
||||||
```ts
|
|
||||||
// hooks/sentinel-guard/handler.ts
|
|
||||||
import { validateInput, validateToolCall } from '@sentinelseed/moltbot';
|
|
||||||
|
|
||||||
export default {
|
|
||||||
'message:before': async (ctx) => {
|
|
||||||
const result = await validateInput(ctx.message.text);
|
|
||||||
if (result.blocked) {
|
|
||||||
return { abort: true, reason: result.reason };
|
|
||||||
}
|
|
||||||
},
|
|
||||||
|
|
||||||
'tool:before': async (ctx) => {
|
|
||||||
const result = await validateToolCall(ctx.tool);
|
|
||||||
if (result.blocked) {
|
|
||||||
return { abort: true, reason: result.reason };
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
export const moltbot_hooks = {
|
||||||
|
message_received: hooks.messageReceived,
|
||||||
|
before_agent_start: hooks.beforeAgentStart,
|
||||||
|
message_sending: hooks.messageSending,
|
||||||
|
before_tool_call: hooks.beforeToolCall,
|
||||||
|
agent_end: hooks.agentEnd,
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Validators
|
||||||
|
|
||||||
|
For advanced use cases, validators can be used directly:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
import { validateOutput, validateTool, analyzeInput, getLevelConfig } from '@sentinelseed/moltbot';
|
||||||
|
|
||||||
|
const levelConfig = getLevelConfig('guard');
|
||||||
|
|
||||||
|
const outputResult = await validateOutput(content, levelConfig);
|
||||||
|
if (outputResult.shouldBlock) {
|
||||||
|
console.log('Blocked:', outputResult.issues);
|
||||||
|
}
|
||||||
|
|
||||||
|
const toolResult = await validateTool('bash', { command: 'ls' }, levelConfig);
|
||||||
|
const inputResult = await analyzeInput(userMessage);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Escape Hatches
|
||||||
|
|
||||||
|
When you need to bypass protection:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
/sentinel pause 5m # Pause for 5 minutes
|
||||||
|
/sentinel allow-once # Allow next action
|
||||||
|
/sentinel trust bash # Trust a tool for the session
|
||||||
|
/sentinel resume # Resume protection
|
||||||
|
```
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
You can customize validation behavior by creating a `sentinel.config.ts` in your workspace. The config accepts custom patterns for dangerous commands and credentials.
|
```json
|
||||||
|
{
|
||||||
```ts
|
"plugins": {
|
||||||
import { defineConfig } from '@sentinelseed/moltbot';
|
"sentinel": {
|
||||||
|
"level": "guard",
|
||||||
export default defineConfig({
|
"alerts": {
|
||||||
blockDangerousCommands: true,
|
"enabled": true,
|
||||||
dangerousPatterns: [/rm\s+-rf/, /DROP\s+TABLE/i],
|
"webhook": "https://your-webhook.com/sentinel",
|
||||||
credentialPatterns: [/sk-[a-zA-Z0-9]{48}/, /ghp_[a-zA-Z0-9]{36}/],
|
"minSeverity": "high"
|
||||||
});
|
},
|
||||||
|
"ignorePatterns": ["MY_SAFE_TOKEN"],
|
||||||
|
"logLevel": "warn"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
All validation runs locally without external API calls. Typical latency is 2-5ms per validation call.
|
All validation runs locally without external API calls.
|
||||||
|
|
||||||
## Links
|
## Links
|
||||||
|
|
||||||
See the [npm package](https://www.npmjs.com/package/@sentinelseed/moltbot) for installation details, the [source repository](https://github.com/sentinel-seed/sentinel) for implementation, and the [Sentinel documentation](https://sentinelseed.dev/docs/moltbot) for additional examples.
|
See the [npm package](https://www.npmjs.com/package/@sentinelseed/moltbot) for installation details, the [source repository](https://github.com/sentinel-seed/sentinel) for implementation, and the [Sentinel documentation](https://sentinelseed.dev/docs/integrations/moltbot) for additional examples.
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user