Add comprehensive documentation for integrating security guardrails using the @sentinelseed/moltbot package: - Input validation for prompt injection protection - Tool call validation for dangerous command blocking - Output validation for credential leak prevention - Hook integration examples - Configuration options
3.1 KiB
| title | summary | permalink |
|---|---|---|
| Guardrails | Input/output validation and tool call security with @sentinelseed/moltbot. | /security/guardrails/ |
Guardrails
The @sentinelseed/moltbot package provides security guardrails for Moltbot, including prompt injection detection, tool call validation, and credential leak prevention.
npm install @sentinelseed/moltbot
Usage
The package exposes three main functions: validateInput, validateToolCall, and validateOutput. Each returns a result object with a blocked boolean and, when blocked, a reason string explaining why.
Input validation checks user messages before they reach the agent. It detects prompt injection attempts, jailbreak patterns, and encoded payloads (base64, hex).
import { validateInput } from '@sentinelseed/moltbot';
const result = await validateInput(userMessage);
if (result.blocked) {
// handle blocked input
}
Tool call validation inspects tool invocations before execution. It blocks dangerous shell commands (rm -rf, format), SQL injection patterns, path traversal attempts, and command injection via shell metacharacters.
import { validateToolCall } from '@sentinelseed/moltbot';
const result = await validateToolCall({
name: 'shell',
arguments: { command: 'rm -rf /' }
});
Output validation scans agent responses for leaked credentials. It catches API keys (OpenAI, GitHub, AWS), passwords, private keys (SSH, PGP), and tokens (JWT, bearer).
import { validateOutput } from '@sentinelseed/moltbot';
const result = await validateOutput(aiResponse);
Hook integration
The recommended approach is to wire validation into Moltbot's hook system. The example below validates both incoming messages and tool calls before they execute.
// hooks/sentinel-guard/handler.ts
import { validateInput, validateToolCall } from '@sentinelseed/moltbot';
export default {
'message:before': async (ctx) => {
const result = await validateInput(ctx.message.text);
if (result.blocked) {
return { abort: true, reason: result.reason };
}
},
'tool:before': async (ctx) => {
const result = await validateToolCall(ctx.tool);
if (result.blocked) {
return { abort: true, reason: result.reason };
}
}
};
Configuration
You can customize validation behavior by creating a sentinel.config.ts in your workspace. The config accepts custom patterns for dangerous commands and credentials.
import { defineConfig } from '@sentinelseed/moltbot';
export default defineConfig({
blockDangerousCommands: true,
dangerousPatterns: [/rm\s+-rf/, /DROP\s+TABLE/i],
credentialPatterns: [/sk-[a-zA-Z0-9]{48}/, /ghp_[a-zA-Z0-9]{36}/],
});
All validation runs locally without external API calls. Typical latency is 2-5ms per validation call.
Links
See the npm package for installation details, the source repository for implementation, and the Sentinel documentation for additional examples.