Compare commits
2 Commits
main
...
feature/li
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6a0eafd191 | ||
|
|
4a90b2f329 |
@ -11,6 +11,7 @@ Docs: https://docs.clawd.bot
|
|||||||
- Docs: expand FAQ (migration, scheduling, concurrency, model recommendations, OpenAI subscription auth, Pi sizing, hackable install, docs SSL workaround).
|
- Docs: expand FAQ (migration, scheduling, concurrency, model recommendations, OpenAI subscription auth, Pi sizing, hackable install, docs SSL workaround).
|
||||||
- Docs: add verbose installer troubleshooting guidance.
|
- Docs: add verbose installer troubleshooting guidance.
|
||||||
- Docs: update Fly.io guide notes.
|
- Docs: update Fly.io guide notes.
|
||||||
|
- Tools: add link understanding with CLI preprocessing, SSRF guard, and structured link summaries. (#1637) Thanks @dasilva333.
|
||||||
- Docs: add Bedrock EC2 instance role setup + IAM steps. (#1625) Thanks @sergical. https://docs.clawd.bot/bedrock
|
- Docs: add Bedrock EC2 instance role setup + IAM steps. (#1625) Thanks @sergical. https://docs.clawd.bot/bedrock
|
||||||
- Exec approvals: forward approval prompts to chat with `/approve` for all channels (including plugins). (#1621) Thanks @czekaj. https://docs.clawd.bot/tools/exec-approvals https://docs.clawd.bot/tools/slash-commands
|
- Exec approvals: forward approval prompts to chat with `/approve` for all channels (including plugins). (#1621) Thanks @czekaj. https://docs.clawd.bot/tools/exec-approvals https://docs.clawd.bot/tools/slash-commands
|
||||||
- Gateway: expose config.patch in the gateway tool with safe partial updates + restart sentinel. (#1653) Thanks @Glucksberg.
|
- Gateway: expose config.patch in the gateway tool with safe partial updates + restart sentinel. (#1653) Thanks @Glucksberg.
|
||||||
|
|||||||
@ -1005,6 +1005,7 @@
|
|||||||
"plugins/zalouser",
|
"plugins/zalouser",
|
||||||
"tools/exec",
|
"tools/exec",
|
||||||
"tools/web",
|
"tools/web",
|
||||||
|
"tools/link-understanding",
|
||||||
"tools/apply-patch",
|
"tools/apply-patch",
|
||||||
"tools/elevated",
|
"tools/elevated",
|
||||||
"tools/browser",
|
"tools/browser",
|
||||||
|
|||||||
@ -2010,6 +2010,35 @@ Example:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
`tools.links` configures link understanding (URL preprocessing via CLI):
|
||||||
|
- `tools.links.enabled`: enable/disable link understanding (default: enabled when models are configured).
|
||||||
|
- `tools.links.maxLinks`: max URLs per message (default 3).
|
||||||
|
- `tools.links.timeoutSeconds`: default CLI timeout (default 30).
|
||||||
|
- `tools.links.scope`: optional gating rules (same shape as media understanding scope).
|
||||||
|
- `tools.links.models`: ordered CLI entries (fallbacks in order).
|
||||||
|
- `command`: executable to run (required).
|
||||||
|
- `args`: templated args (supports `{{LinkUrl}}`).
|
||||||
|
- `timeoutSeconds`: optional per-entry override.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
maxLinks: 2,
|
||||||
|
scope: {
|
||||||
|
default: "deny",
|
||||||
|
rules: [{ action: "allow", match: { chatType: "direct" } }]
|
||||||
|
},
|
||||||
|
models: [
|
||||||
|
{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
`agents.defaults.subagents` configures sub-agent defaults:
|
`agents.defaults.subagents` configures sub-agent defaults:
|
||||||
- `model`: default model for spawned sub-agents (string or `{ primary, fallbacks }`). If omitted, sub-agents inherit the caller’s model unless overridden per agent or per call.
|
- `model`: default model for spawned sub-agents (string or `{ primary, fallbacks }`). If omitted, sub-agents inherit the caller’s model unless overridden per agent or per call.
|
||||||
- `maxConcurrent`: max concurrent sub-agent runs (default 1)
|
- `maxConcurrent`: max concurrent sub-agent runs (default 1)
|
||||||
@ -3180,7 +3209,7 @@ clawdbot dns setup --apply
|
|||||||
|
|
||||||
## Template variables
|
## Template variables
|
||||||
|
|
||||||
Template placeholders are expanded in `tools.media.*.models[].args` and `tools.media.models[].args` (and any future templated argument fields).
|
Template placeholders are expanded in `tools.media.*.models[].args`, `tools.media.models[].args`, and `tools.links.models[].args` (and any future templated argument fields).
|
||||||
|
|
||||||
| Variable | Description |
|
| Variable | Description |
|
||||||
|----------|-------------|
|
|----------|-------------|
|
||||||
@ -3195,6 +3224,7 @@ Template placeholders are expanded in `tools.media.*.models[].args` and `tools.m
|
|||||||
| `{{MediaUrl}}` | Inbound media pseudo-URL (if present) |
|
| `{{MediaUrl}}` | Inbound media pseudo-URL (if present) |
|
||||||
| `{{MediaPath}}` | Local media path (if downloaded) |
|
| `{{MediaPath}}` | Local media path (if downloaded) |
|
||||||
| `{{MediaType}}` | Media type (image/audio/document/…) |
|
| `{{MediaType}}` | Media type (image/audio/document/…) |
|
||||||
|
| `{{LinkUrl}}` | URL currently being processed (link understanding only) |
|
||||||
| `{{Transcript}}` | Audio transcript (when enabled) |
|
| `{{Transcript}}` | Audio transcript (when enabled) |
|
||||||
| `{{Prompt}}` | Resolved media prompt for CLI entries |
|
| `{{Prompt}}` | Resolved media prompt for CLI entries |
|
||||||
| `{{MaxChars}}` | Resolved max output chars for CLI entries |
|
| `{{MaxChars}}` | Resolved max output chars for CLI entries |
|
||||||
|
|||||||
@ -232,6 +232,13 @@ Notes:
|
|||||||
- See [Web tools](/tools/web) for setup.
|
- See [Web tools](/tools/web) for setup.
|
||||||
- See [Firecrawl](/tools/firecrawl) for the optional anti-bot fallback.
|
- See [Firecrawl](/tools/firecrawl) for the optional anti-bot fallback.
|
||||||
|
|
||||||
|
### Link understanding (inbound)
|
||||||
|
Preprocess inbound messages by running a CLI on detected URLs and appending the
|
||||||
|
results to the message body before the agent sees it. This is **not** an agent
|
||||||
|
tool; it runs in the inbound auto-reply pipeline.
|
||||||
|
|
||||||
|
See [Link understanding](/tools/link-understanding) for setup and configuration.
|
||||||
|
|
||||||
### `browser`
|
### `browser`
|
||||||
Control the dedicated clawd browser.
|
Control the dedicated clawd browser.
|
||||||
|
|
||||||
|
|||||||
326
docs/tools/link-understanding.md
Normal file
326
docs/tools/link-understanding.md
Normal file
@ -0,0 +1,326 @@
|
|||||||
|
---
|
||||||
|
summary: "Run a CLI on URLs in inbound messages and inject results before the agent replies"
|
||||||
|
read_when:
|
||||||
|
- You want to summarize or enrich URLs in inbound messages
|
||||||
|
- You need to enable or scope link understanding
|
||||||
|
- You want to wire a custom CLI to preprocess links
|
||||||
|
---
|
||||||
|
|
||||||
|
# Link understanding
|
||||||
|
|
||||||
|
Link understanding is the **inbound URL enrichment pipeline**: Clawdbot detects
|
||||||
|
links in a message, runs your **local CLI** to interpret them, and injects the
|
||||||
|
results into the message body **before the agent sees it**. The goal is a
|
||||||
|
predictable, structured prompt envelope without adding a browsing tool.
|
||||||
|
|
||||||
|
This is **not** an agent tool. It is a preprocessing step in the auto-reply
|
||||||
|
pipeline, shared across all channels (including webchat).
|
||||||
|
|
||||||
|
## What it enables
|
||||||
|
|
||||||
|
- Summarize or extract data from links (docs, tickets, dashboards, runbooks).
|
||||||
|
- Inject structured context (titles, owners, status, timestamps, key facts).
|
||||||
|
- Normalize or transform link payloads (HTML → text, JSON → bullets).
|
||||||
|
- Gate link processing by channel, chat type, or session key.
|
||||||
|
- Apply SSRF protections before your CLI runs.
|
||||||
|
|
||||||
|
## When it runs (pipeline order)
|
||||||
|
|
||||||
|
Link understanding runs after inbound normalization and media understanding,
|
||||||
|
but before the agent executes the reply run:
|
||||||
|
|
||||||
|
1) Inbound message is normalized and routed.
|
||||||
|
2) Media understanding may rewrite the body (images/audio/video).
|
||||||
|
3) Link understanding detects URLs in the current command/body and appends results.
|
||||||
|
4) The agent receives the final body and command parsing uses the updated text.
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
1) Clawdbot scans the inbound message for **bare** `http/https` URLs.
|
||||||
|
2) It dedupes links in order and caps them at `tools.links.maxLinks` (default 3).
|
||||||
|
3) For each link, it runs your configured CLI entries **in order** until one returns non-empty stdout.
|
||||||
|
4) It appends the resulting outputs to the message body (structured envelope).
|
||||||
|
5) The agent sees the original text plus the appended link outputs.
|
||||||
|
|
||||||
|
No requests are made by Clawdbot itself; **your CLI does the fetching/parsing**.
|
||||||
|
|
||||||
|
## URL detection rules
|
||||||
|
|
||||||
|
- Only **bare URLs** are extracted (e.g. `https://example.com`).
|
||||||
|
- Markdown links are ignored (`[label](https://example.com)` is stripped).
|
||||||
|
- Links are deduped (first occurrence wins).
|
||||||
|
- Only `http` and `https` are allowed.
|
||||||
|
- Local and private hosts are blocked (see **Security** below).
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Link understanding is configured under `tools.links`:
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
maxLinks: 3,
|
||||||
|
timeoutSeconds: 30,
|
||||||
|
scope: {
|
||||||
|
default: "allow",
|
||||||
|
rules: [{ action: "deny", match: { chatType: "group" } }]
|
||||||
|
},
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
command: "link-understand",
|
||||||
|
args: ["--url", "{{LinkUrl}}", "--format", "markdown"],
|
||||||
|
timeoutSeconds: 20
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `tools.links` fields
|
||||||
|
|
||||||
|
- `enabled`: enable/disable link understanding (default: enabled when models are configured).
|
||||||
|
- `maxLinks`: max URLs processed per message (default 3).
|
||||||
|
- `timeoutSeconds`: default timeout for CLI runs (default 30).
|
||||||
|
- `scope`: optional gating rules (same structure as media understanding scope).
|
||||||
|
- `models`: ordered list of CLI entries (fallbacks in order).
|
||||||
|
|
||||||
|
### Model entries (`tools.links.models[]`)
|
||||||
|
|
||||||
|
Each entry is a CLI command:
|
||||||
|
|
||||||
|
- `type`: optional, only `"cli"` is supported (default).
|
||||||
|
- `command`: executable to run (required).
|
||||||
|
- `args`: CLI args (templated; see **Template variables**).
|
||||||
|
- `timeoutSeconds`: optional override for this entry.
|
||||||
|
|
||||||
|
The first entry that returns **non-empty stdout** wins. If a command fails or
|
||||||
|
returns empty output, Clawdbot tries the next entry.
|
||||||
|
|
||||||
|
## Performance + ordering
|
||||||
|
|
||||||
|
- Links are processed **sequentially** (per message, in order).
|
||||||
|
- Each CLI entry has a timeout (per entry or default).
|
||||||
|
- Failure to run a CLI does **not** abort the agent run; it just tries the next entry.
|
||||||
|
- `maxLinks` is the primary cap for latency. Keep it low for chatty channels.
|
||||||
|
|
||||||
|
## Template variables
|
||||||
|
|
||||||
|
`tools.links.models[].args` supports the standard template variables plus `{{LinkUrl}}`.
|
||||||
|
See [Template variables](/gateway/configuration#template-variables) for the full list.
|
||||||
|
|
||||||
|
Link-specific variable:
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `{{LinkUrl}}` | URL currently being processed |
|
||||||
|
|
||||||
|
## Output format
|
||||||
|
|
||||||
|
Each link is wrapped in a predictable envelope and appended to the message body:
|
||||||
|
|
||||||
|
```
|
||||||
|
<original message>
|
||||||
|
|
||||||
|
[Link]
|
||||||
|
URL: https://example.com
|
||||||
|
Source: link-understand
|
||||||
|
Summary:
|
||||||
|
<cli output>
|
||||||
|
```
|
||||||
|
|
||||||
|
When multiple links are present, the header is numbered (`[Link 1/2]`, `[Link 2/2]`).
|
||||||
|
`Source` is the CLI `command` that produced the output.
|
||||||
|
|
||||||
|
### Output best practices
|
||||||
|
|
||||||
|
- Keep summaries short and stable (avoid prompt bloat).
|
||||||
|
- Prefer a compact, consistent structure (title → bullets → status).
|
||||||
|
- Avoid surrounding fences unless you want the agent to treat it as literal text.
|
||||||
|
|
||||||
|
|
||||||
|
## Scope gating
|
||||||
|
|
||||||
|
`tools.links.scope` uses the same rules as media understanding:
|
||||||
|
|
||||||
|
- `default`: `"allow"` or `"deny"`.
|
||||||
|
- `rules[]`: first match wins.
|
||||||
|
- `match.channel` (surface/channel)
|
||||||
|
- `match.chatType` (`direct`, `group`, `channel`)
|
||||||
|
- `match.keyPrefix` (session key prefix)
|
||||||
|
|
||||||
|
Example (only allow direct messages):
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
scope: {
|
||||||
|
default: "deny",
|
||||||
|
rules: [{ action: "allow", match: { chatType: "direct" } }]
|
||||||
|
},
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security (SSRF guard)
|
||||||
|
|
||||||
|
Clawdbot blocks local/private hosts before invoking your CLI. The guard rejects:
|
||||||
|
|
||||||
|
- `localhost`, `*.localhost`, `*.local`, `*.internal`
|
||||||
|
- Loopback addresses (IPv4 + IPv6)
|
||||||
|
- Private and link-local IP ranges (IPv4 + IPv6)
|
||||||
|
- `metadata.google.internal`
|
||||||
|
- Addresses that resolve to private/internal IPs
|
||||||
|
|
||||||
|
If a URL is blocked, it is **skipped** and your CLI is not invoked for it.
|
||||||
|
URLs that fail DNS resolution are also skipped.
|
||||||
|
|
||||||
|
## Decision tracking
|
||||||
|
|
||||||
|
Each run records a decision payload (useful in logs/debugging) in `ctx.LinkUnderstandingDecisions`:
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
outcome: "success" | "skipped" | "disabled" | "scope-deny" | "no-links",
|
||||||
|
urls: [
|
||||||
|
{
|
||||||
|
url: "https://example.com",
|
||||||
|
attempts: [
|
||||||
|
{ type: "cli", command: "link-understand", outcome: "success" }
|
||||||
|
],
|
||||||
|
chosen: { type: "cli", command: "link-understand", outcome: "success" }
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Custom usage patterns
|
||||||
|
|
||||||
|
### Ticket summarizer (single link)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
maxLinks: 1,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
command: "ticket-summary",
|
||||||
|
args: ["--url", "{{LinkUrl}}", "--format", "brief"]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Per-channel allowlist (only Discord + Slack)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
scope: {
|
||||||
|
default: "deny",
|
||||||
|
rules: [
|
||||||
|
{ action: "allow", match: { channel: "discord" } },
|
||||||
|
{ action: "allow", match: { channel: "slack" } }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Per-agent override (support agent only)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
agents: {
|
||||||
|
list: [
|
||||||
|
{
|
||||||
|
id: "support",
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
maxLinks: 2,
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Markdown link behavior (ignored)
|
||||||
|
|
||||||
|
Message:
|
||||||
|
```
|
||||||
|
Please check [our docs](https://docs.example.com) and https://status.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
Only `https://status.example.com` is processed (markdown links are stripped).
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
- **Nothing happens:** ensure `tools.links.models` is set and `enabled` is not `false`.
|
||||||
|
- **No links detected:** only bare URLs are extracted; markdown links are ignored.
|
||||||
|
- **Output missing:** your CLI returned empty stdout; try logging or return a default line.
|
||||||
|
- **Wrong channel scope:** check `tools.links.scope` rules and `match.chatType`.
|
||||||
|
- **SSRF blocked:** local/private URLs are skipped by design.
|
||||||
|
|
||||||
|
Enable verbose logs to see CLI execution and scope decisions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clawdbot gateway run --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Minimal CLI runner
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Limit to 1 link, custom timeout
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
maxLinks: 1,
|
||||||
|
timeoutSeconds: 10,
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fallback chain (first non-empty output wins)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
models: [
|
||||||
|
{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] },
|
||||||
|
{ command: "link-backup", args: ["{{LinkUrl}}"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
@ -12,6 +12,7 @@ import { resolveCommandAuthorization } from "../command-auth.js";
|
|||||||
import type { MsgContext } from "../templating.js";
|
import type { MsgContext } from "../templating.js";
|
||||||
import { SILENT_REPLY_TOKEN } from "../tokens.js";
|
import { SILENT_REPLY_TOKEN } from "../tokens.js";
|
||||||
import { applyMediaUnderstanding } from "../../media-understanding/apply.js";
|
import { applyMediaUnderstanding } from "../../media-understanding/apply.js";
|
||||||
|
import { applyLinkUnderstanding } from "../../link-understanding/apply.js";
|
||||||
import type { GetReplyOptions, ReplyPayload } from "../types.js";
|
import type { GetReplyOptions, ReplyPayload } from "../types.js";
|
||||||
import { resolveDefaultModel } from "./directive-handling.js";
|
import { resolveDefaultModel } from "./directive-handling.js";
|
||||||
import { resolveReplyDirectives } from "./get-reply-directives.js";
|
import { resolveReplyDirectives } from "./get-reply-directives.js";
|
||||||
@ -89,6 +90,10 @@ export async function getReplyFromConfig(
|
|||||||
agentDir,
|
agentDir,
|
||||||
activeModel: { provider, model },
|
activeModel: { provider, model },
|
||||||
});
|
});
|
||||||
|
await applyLinkUnderstanding({
|
||||||
|
ctx: finalized,
|
||||||
|
cfg,
|
||||||
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
const commandAuthorized = finalized.CommandAuthorized;
|
const commandAuthorized = finalized.CommandAuthorized;
|
||||||
|
|||||||
@ -5,6 +5,7 @@ import type {
|
|||||||
MediaUnderstandingDecision,
|
MediaUnderstandingDecision,
|
||||||
MediaUnderstandingOutput,
|
MediaUnderstandingOutput,
|
||||||
} from "../media-understanding/types.js";
|
} from "../media-understanding/types.js";
|
||||||
|
import type { LinkUnderstandingDecision } from "../link-understanding/types.js";
|
||||||
|
|
||||||
/** Valid message channels for routing. */
|
/** Valid message channels for routing. */
|
||||||
export type OriginatingChannelType = ChannelId | InternalMessageChannel;
|
export type OriginatingChannelType = ChannelId | InternalMessageChannel;
|
||||||
@ -71,6 +72,8 @@ export type MsgContext = {
|
|||||||
Transcript?: string;
|
Transcript?: string;
|
||||||
MediaUnderstanding?: MediaUnderstandingOutput[];
|
MediaUnderstanding?: MediaUnderstandingOutput[];
|
||||||
MediaUnderstandingDecisions?: MediaUnderstandingDecision[];
|
MediaUnderstandingDecisions?: MediaUnderstandingDecision[];
|
||||||
|
LinkUnderstanding?: string[];
|
||||||
|
LinkUnderstandingDecisions?: LinkUnderstandingDecision[];
|
||||||
Prompt?: string;
|
Prompt?: string;
|
||||||
MaxChars?: number;
|
MaxChars?: number;
|
||||||
ChatType?: string;
|
ChatType?: string;
|
||||||
|
|||||||
@ -240,6 +240,17 @@ export function resolveIMessageGroupToolPolicy(
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export function resolveGoogleChatGroupToolPolicy(
|
||||||
|
params: GroupMentionParams,
|
||||||
|
): GroupToolPolicyConfig | undefined {
|
||||||
|
return resolveChannelGroupToolsPolicy({
|
||||||
|
cfg: params.cfg,
|
||||||
|
channel: "googlechat",
|
||||||
|
groupId: params.groupId,
|
||||||
|
accountId: params.accountId,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
export function resolveDiscordGroupToolPolicy(
|
export function resolveDiscordGroupToolPolicy(
|
||||||
params: GroupMentionParams,
|
params: GroupMentionParams,
|
||||||
): GroupToolPolicyConfig | undefined {
|
): GroupToolPolicyConfig | undefined {
|
||||||
|
|||||||
@ -158,6 +158,11 @@ const FIELD_LABELS: Record<string, string> = {
|
|||||||
"tools.media.video.attachments": "Video Understanding Attachment Policy",
|
"tools.media.video.attachments": "Video Understanding Attachment Policy",
|
||||||
"tools.media.video.models": "Video Understanding Models",
|
"tools.media.video.models": "Video Understanding Models",
|
||||||
"tools.media.video.scope": "Video Understanding Scope",
|
"tools.media.video.scope": "Video Understanding Scope",
|
||||||
|
"tools.links.enabled": "Enable Link Understanding",
|
||||||
|
"tools.links.maxLinks": "Link Understanding Max Links",
|
||||||
|
"tools.links.timeoutSeconds": "Link Understanding Timeout (sec)",
|
||||||
|
"tools.links.models": "Link Understanding Models",
|
||||||
|
"tools.links.scope": "Link Understanding Scope",
|
||||||
"tools.profile": "Tool Profile",
|
"tools.profile": "Tool Profile",
|
||||||
"agents.list[].tools.profile": "Agent Tool Profile",
|
"agents.list[].tools.profile": "Agent Tool Profile",
|
||||||
"tools.byProvider": "Tool Policy by Provider",
|
"tools.byProvider": "Tool Policy by Provider",
|
||||||
|
|||||||
@ -102,6 +102,30 @@ export type MediaUnderstandingConfig = {
|
|||||||
models?: MediaUnderstandingModelConfig[];
|
models?: MediaUnderstandingModelConfig[];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
export type LinkModelConfig = {
|
||||||
|
/** Use a CLI command for link processing. */
|
||||||
|
type?: "cli";
|
||||||
|
/** CLI binary (required when type=cli). */
|
||||||
|
command: string;
|
||||||
|
/** CLI args (template-enabled). */
|
||||||
|
args?: string[];
|
||||||
|
/** Optional timeout override (seconds) for this model entry. */
|
||||||
|
timeoutSeconds?: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type LinkToolsConfig = {
|
||||||
|
/** Enable link understanding when models are configured. */
|
||||||
|
enabled?: boolean;
|
||||||
|
/** Optional scope gating for understanding. */
|
||||||
|
scope?: MediaUnderstandingScopeConfig;
|
||||||
|
/** Max number of links to process per message. */
|
||||||
|
maxLinks?: number;
|
||||||
|
/** Default timeout (seconds). */
|
||||||
|
timeoutSeconds?: number;
|
||||||
|
/** Ordered model list (fallbacks in order). */
|
||||||
|
models?: LinkModelConfig[];
|
||||||
|
};
|
||||||
|
|
||||||
export type MediaToolsConfig = {
|
export type MediaToolsConfig = {
|
||||||
/** Shared model list applied across image/audio/video. */
|
/** Shared model list applied across image/audio/video. */
|
||||||
models?: MediaUnderstandingModelConfig[];
|
models?: MediaUnderstandingModelConfig[];
|
||||||
@ -347,6 +371,7 @@ export type ToolsConfig = {
|
|||||||
};
|
};
|
||||||
};
|
};
|
||||||
media?: MediaToolsConfig;
|
media?: MediaToolsConfig;
|
||||||
|
links?: LinkToolsConfig;
|
||||||
/** Message tool configuration. */
|
/** Message tool configuration. */
|
||||||
message?: {
|
message?: {
|
||||||
/**
|
/**
|
||||||
|
|||||||
@ -5,6 +5,7 @@ import {
|
|||||||
GroupChatSchema,
|
GroupChatSchema,
|
||||||
HumanDelaySchema,
|
HumanDelaySchema,
|
||||||
IdentitySchema,
|
IdentitySchema,
|
||||||
|
ToolsLinksSchema,
|
||||||
ToolsMediaSchema,
|
ToolsMediaSchema,
|
||||||
} from "./zod-schema.core.js";
|
} from "./zod-schema.core.js";
|
||||||
|
|
||||||
@ -428,6 +429,7 @@ export const ToolsSchema = z
|
|||||||
byProvider: z.record(z.string(), ToolPolicyWithProfileSchema).optional(),
|
byProvider: z.record(z.string(), ToolPolicyWithProfileSchema).optional(),
|
||||||
web: ToolsWebSchema,
|
web: ToolsWebSchema,
|
||||||
media: ToolsMediaSchema,
|
media: ToolsMediaSchema,
|
||||||
|
links: ToolsLinksSchema,
|
||||||
message: z
|
message: z
|
||||||
.object({
|
.object({
|
||||||
allowCrossContextSend: z.boolean().optional(),
|
allowCrossContextSend: z.boolean().optional(),
|
||||||
|
|||||||
@ -453,6 +453,26 @@ export const ToolsMediaSchema = z
|
|||||||
.strict()
|
.strict()
|
||||||
.optional();
|
.optional();
|
||||||
|
|
||||||
|
export const LinkModelSchema = z
|
||||||
|
.object({
|
||||||
|
type: z.literal("cli").optional(),
|
||||||
|
command: z.string().min(1),
|
||||||
|
args: z.array(z.string()).optional(),
|
||||||
|
timeoutSeconds: z.number().int().positive().optional(),
|
||||||
|
})
|
||||||
|
.strict();
|
||||||
|
|
||||||
|
export const ToolsLinksSchema = z
|
||||||
|
.object({
|
||||||
|
enabled: z.boolean().optional(),
|
||||||
|
scope: MediaUnderstandingScopeSchema,
|
||||||
|
maxLinks: z.number().int().positive().optional(),
|
||||||
|
timeoutSeconds: z.number().int().positive().optional(),
|
||||||
|
models: z.array(LinkModelSchema).optional(),
|
||||||
|
})
|
||||||
|
.strict()
|
||||||
|
.optional();
|
||||||
|
|
||||||
export const NativeCommandsSettingSchema = z.union([z.boolean(), z.literal("auto")]);
|
export const NativeCommandsSettingSchema = z.union([z.boolean(), z.literal("auto")]);
|
||||||
|
|
||||||
export const ProviderCommandsSchema = z
|
export const ProviderCommandsSchema = z
|
||||||
|
|||||||
61
src/link-understanding/apply.test.ts
Normal file
61
src/link-understanding/apply.test.ts
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
import { describe, expect, it, vi } from "vitest";
|
||||||
|
|
||||||
|
import type { ClawdbotConfig } from "../config/config.js";
|
||||||
|
import type { MsgContext } from "../auto-reply/templating.js";
|
||||||
|
import { runLinkUnderstanding } from "./runner.js";
|
||||||
|
|
||||||
|
vi.mock("./runner.js", () => ({
|
||||||
|
runLinkUnderstanding: vi.fn(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
describe("applyLinkUnderstanding", () => {
|
||||||
|
it("keeps command parsing bodies unchanged when link output is applied", async () => {
|
||||||
|
const mockedRunLinkUnderstanding = vi.mocked(runLinkUnderstanding);
|
||||||
|
mockedRunLinkUnderstanding.mockResolvedValue({
|
||||||
|
urls: ["https://example.com"],
|
||||||
|
outputs: [{ url: "https://example.com", text: "Summary", source: "link-cli" }],
|
||||||
|
decisions: [],
|
||||||
|
});
|
||||||
|
|
||||||
|
const { applyLinkUnderstanding } = await import("./apply.js");
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "check https://example.com",
|
||||||
|
RawBody: "raw override",
|
||||||
|
CommandBody: "/think low check https://example.com",
|
||||||
|
};
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: { links: { models: [{ command: "link-cli" }] } },
|
||||||
|
};
|
||||||
|
|
||||||
|
await applyLinkUnderstanding({ ctx, cfg });
|
||||||
|
|
||||||
|
expect(ctx.Body).toContain("[Link]");
|
||||||
|
expect(ctx.CommandBody).toBe("/think low check https://example.com");
|
||||||
|
expect(ctx.RawBody).toBe("raw override");
|
||||||
|
expect(ctx.BodyForCommands).toBe("/think low check https://example.com");
|
||||||
|
expect(ctx.BodyForCommands).not.toContain("[Link]");
|
||||||
|
});
|
||||||
|
|
||||||
|
it("preserves original body for command parsing when no overrides exist", async () => {
|
||||||
|
const mockedRunLinkUnderstanding = vi.mocked(runLinkUnderstanding);
|
||||||
|
mockedRunLinkUnderstanding.mockResolvedValue({
|
||||||
|
urls: ["https://example.com"],
|
||||||
|
outputs: [{ url: "https://example.com", text: "Summary", source: "link-cli" }],
|
||||||
|
decisions: [],
|
||||||
|
});
|
||||||
|
|
||||||
|
const { applyLinkUnderstanding } = await import("./apply.js");
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "check https://example.com",
|
||||||
|
};
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: { links: { models: [{ command: "link-cli" }] } },
|
||||||
|
};
|
||||||
|
|
||||||
|
await applyLinkUnderstanding({ ctx, cfg });
|
||||||
|
|
||||||
|
expect(ctx.Body).toContain("[Link]");
|
||||||
|
expect(ctx.RawBody).toBe("check https://example.com");
|
||||||
|
expect(ctx.BodyForCommands).toBe("check https://example.com");
|
||||||
|
});
|
||||||
|
});
|
||||||
52
src/link-understanding/apply.ts
Normal file
52
src/link-understanding/apply.ts
Normal file
@ -0,0 +1,52 @@
|
|||||||
|
import type { ClawdbotConfig } from "../config/config.js";
|
||||||
|
import type { MsgContext } from "../auto-reply/templating.js";
|
||||||
|
import { finalizeInboundContext } from "../auto-reply/reply/inbound-context.js";
|
||||||
|
import { formatLinkUnderstandingBody, formatLinkUnderstandingSections } from "./format.js";
|
||||||
|
import { runLinkUnderstanding } from "./runner.js";
|
||||||
|
import type { LinkUnderstandingDecision, LinkUnderstandingOutput } from "./types.js";
|
||||||
|
|
||||||
|
export type ApplyLinkUnderstandingResult = {
|
||||||
|
outputs: LinkUnderstandingOutput[];
|
||||||
|
urls: string[];
|
||||||
|
decisions: LinkUnderstandingDecision[];
|
||||||
|
};
|
||||||
|
|
||||||
|
export async function applyLinkUnderstanding(params: {
|
||||||
|
ctx: MsgContext;
|
||||||
|
cfg: ClawdbotConfig;
|
||||||
|
}): Promise<ApplyLinkUnderstandingResult> {
|
||||||
|
const result = await runLinkUnderstanding({
|
||||||
|
cfg: params.cfg,
|
||||||
|
ctx: params.ctx,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (result.decisions.length > 0) {
|
||||||
|
params.ctx.LinkUnderstandingDecisions = [
|
||||||
|
...(params.ctx.LinkUnderstandingDecisions ?? []),
|
||||||
|
...result.decisions,
|
||||||
|
];
|
||||||
|
}
|
||||||
|
|
||||||
|
if (result.outputs.length === 0) {
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
const originalBody = params.ctx.Body;
|
||||||
|
const sections = formatLinkUnderstandingSections(result.outputs);
|
||||||
|
if (sections.length > 0) {
|
||||||
|
params.ctx.LinkUnderstanding = [...(params.ctx.LinkUnderstanding ?? []), ...sections];
|
||||||
|
}
|
||||||
|
params.ctx.Body = formatLinkUnderstandingBody({
|
||||||
|
body: params.ctx.Body,
|
||||||
|
outputs: result.outputs,
|
||||||
|
});
|
||||||
|
if (!params.ctx.CommandBody && !params.ctx.RawBody && typeof originalBody === "string") {
|
||||||
|
params.ctx.RawBody = originalBody;
|
||||||
|
}
|
||||||
|
finalizeInboundContext(params.ctx, {
|
||||||
|
forceBodyForAgent: true,
|
||||||
|
forceBodyForCommands: true,
|
||||||
|
});
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
2
src/link-understanding/defaults.ts
Normal file
2
src/link-understanding/defaults.ts
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
export const DEFAULT_LINK_TIMEOUT_SECONDS = 30;
|
||||||
|
export const DEFAULT_MAX_LINKS = 3;
|
||||||
29
src/link-understanding/detect.test.ts
Normal file
29
src/link-understanding/detect.test.ts
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
import { describe, expect, it } from "vitest";
|
||||||
|
|
||||||
|
import { extractLinksFromMessage } from "./detect.js";
|
||||||
|
|
||||||
|
describe("extractLinksFromMessage", () => {
|
||||||
|
it("extracts bare http/https URLs in order", () => {
|
||||||
|
const links = extractLinksFromMessage("see https://a.example and http://b.test");
|
||||||
|
expect(links).toEqual(["https://a.example", "http://b.test"]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("dedupes links and enforces maxLinks", () => {
|
||||||
|
const links = extractLinksFromMessage("https://a.example https://a.example https://b.test", {
|
||||||
|
maxLinks: 1,
|
||||||
|
});
|
||||||
|
expect(links).toEqual(["https://a.example"]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("ignores markdown links", () => {
|
||||||
|
const links = extractLinksFromMessage("[doc](https://docs.example) https://bare.example");
|
||||||
|
expect(links).toEqual(["https://bare.example"]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("blocks loopback and private hosts", () => {
|
||||||
|
const links = extractLinksFromMessage(
|
||||||
|
"http://localhost/test http://127.0.0.1/test http://[::1]/test http://10.0.0.1/test https://ok.test",
|
||||||
|
);
|
||||||
|
expect(links).toEqual(["https://ok.test"]);
|
||||||
|
});
|
||||||
|
});
|
||||||
53
src/link-understanding/detect.ts
Normal file
53
src/link-understanding/detect.ts
Normal file
@ -0,0 +1,53 @@
|
|||||||
|
import { isBlockedHostname, isPrivateIpAddress } from "../infra/net/ssrf.js";
|
||||||
|
import { DEFAULT_MAX_LINKS } from "./defaults.js";
|
||||||
|
|
||||||
|
// Remove markdown link syntax so only bare URLs are considered.
|
||||||
|
const MARKDOWN_LINK_RE = /\[[^\]]*]\((https?:\/\/\S+?)\)/gi;
|
||||||
|
const BARE_LINK_RE = /https?:\/\/\S+/gi;
|
||||||
|
|
||||||
|
function stripMarkdownLinks(message: string): string {
|
||||||
|
return message.replace(MARKDOWN_LINK_RE, " ");
|
||||||
|
}
|
||||||
|
|
||||||
|
function resolveMaxLinks(value?: number): number {
|
||||||
|
if (typeof value === "number" && Number.isFinite(value) && value > 0) {
|
||||||
|
return Math.floor(value);
|
||||||
|
}
|
||||||
|
return DEFAULT_MAX_LINKS;
|
||||||
|
}
|
||||||
|
|
||||||
|
function isAllowedUrl(raw: string): boolean {
|
||||||
|
try {
|
||||||
|
const parsed = new URL(raw);
|
||||||
|
if (parsed.protocol !== "http:" && parsed.protocol !== "https:") return false;
|
||||||
|
const hostname = parsed.hostname;
|
||||||
|
if (!hostname) return false;
|
||||||
|
if (isBlockedHostname(hostname)) return false;
|
||||||
|
if (isPrivateIpAddress(hostname)) return false;
|
||||||
|
return true;
|
||||||
|
} catch {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export function extractLinksFromMessage(message: string, opts?: { maxLinks?: number }): string[] {
|
||||||
|
const source = message?.trim();
|
||||||
|
if (!source) return [];
|
||||||
|
|
||||||
|
const maxLinks = resolveMaxLinks(opts?.maxLinks);
|
||||||
|
const sanitized = stripMarkdownLinks(source);
|
||||||
|
const seen = new Set<string>();
|
||||||
|
const results: string[] = [];
|
||||||
|
|
||||||
|
for (const match of sanitized.matchAll(BARE_LINK_RE)) {
|
||||||
|
const raw = match[0]?.trim();
|
||||||
|
if (!raw) continue;
|
||||||
|
if (!isAllowedUrl(raw)) continue;
|
||||||
|
if (seen.has(raw)) continue;
|
||||||
|
seen.add(raw);
|
||||||
|
results.push(raw);
|
||||||
|
if (results.length >= maxLinks) break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return results;
|
||||||
|
}
|
||||||
27
src/link-understanding/format.test.ts
Normal file
27
src/link-understanding/format.test.ts
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
import { describe, expect, it } from "vitest";
|
||||||
|
|
||||||
|
import { formatLinkUnderstandingBody } from "./format.js";
|
||||||
|
|
||||||
|
describe("formatLinkUnderstandingBody", () => {
|
||||||
|
it("appends a structured link summary with source", () => {
|
||||||
|
const body = formatLinkUnderstandingBody({
|
||||||
|
body: "Check this",
|
||||||
|
outputs: [{ url: "https://example.com", text: "Summary here", source: "link-cli" }],
|
||||||
|
});
|
||||||
|
expect(body).toBe(
|
||||||
|
"Check this\n\n[Link]\nURL: https://example.com\nSource: link-cli\nSummary:\nSummary here",
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("numbers multiple links and omits empty body", () => {
|
||||||
|
const body = formatLinkUnderstandingBody({
|
||||||
|
outputs: [
|
||||||
|
{ url: "https://a.example", text: "First", source: "cli-a" },
|
||||||
|
{ url: "https://b.example", text: "Second", source: "cli-b" },
|
||||||
|
],
|
||||||
|
});
|
||||||
|
expect(body).toBe(
|
||||||
|
"[Link 1/2]\nURL: https://a.example\nSource: cli-a\nSummary:\nFirst\n\n[Link 2/2]\nURL: https://b.example\nSource: cli-b\nSummary:\nSecond",
|
||||||
|
);
|
||||||
|
});
|
||||||
|
});
|
||||||
31
src/link-understanding/format.ts
Normal file
31
src/link-understanding/format.ts
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
import type { LinkUnderstandingOutput } from "./types.js";
|
||||||
|
|
||||||
|
function formatSection(output: LinkUnderstandingOutput, index: number, total: number): string {
|
||||||
|
const label = total > 1 ? `Link ${index + 1}/${total}` : "Link";
|
||||||
|
const source = output.source?.trim() || "unknown";
|
||||||
|
const lines = [`[${label}]`, `URL: ${output.url}`, `Source: ${source}`];
|
||||||
|
lines.push(`Summary:\n${output.text.trim()}`);
|
||||||
|
return lines.join("\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatLinkUnderstandingSections(outputs: LinkUnderstandingOutput[]): string[] {
|
||||||
|
const trimmed = outputs
|
||||||
|
.map((output) => ({ ...output, text: output.text.trim(), url: output.url.trim() }))
|
||||||
|
.filter((output) => output.text && output.url);
|
||||||
|
if (trimmed.length === 0) return [];
|
||||||
|
return trimmed.map((output, index) => formatSection(output, index, trimmed.length));
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatLinkUnderstandingBody(params: {
|
||||||
|
body?: string;
|
||||||
|
outputs: LinkUnderstandingOutput[];
|
||||||
|
}): string {
|
||||||
|
const sections = formatLinkUnderstandingSections(params.outputs);
|
||||||
|
if (sections.length === 0) {
|
||||||
|
return params.body ?? "";
|
||||||
|
}
|
||||||
|
|
||||||
|
const base = (params.body ?? "").trim();
|
||||||
|
if (!base) return sections.join("\n\n");
|
||||||
|
return `${base}\n\n${sections.join("\n\n")}`;
|
||||||
|
}
|
||||||
11
src/link-understanding/index.ts
Normal file
11
src/link-understanding/index.ts
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
export { applyLinkUnderstanding } from "./apply.js";
|
||||||
|
export { extractLinksFromMessage } from "./detect.js";
|
||||||
|
export { formatLinkUnderstandingBody } from "./format.js";
|
||||||
|
export { runLinkUnderstanding } from "./runner.js";
|
||||||
|
export type {
|
||||||
|
LinkUnderstandingDecision,
|
||||||
|
LinkUnderstandingDecisionOutcome,
|
||||||
|
LinkUnderstandingModelDecision,
|
||||||
|
LinkUnderstandingOutput,
|
||||||
|
LinkUnderstandingUrlDecision,
|
||||||
|
} from "./types.js";
|
||||||
84
src/link-understanding/runner.test.ts
Normal file
84
src/link-understanding/runner.test.ts
Normal file
@ -0,0 +1,84 @@
|
|||||||
|
import { beforeEach, describe, expect, it, vi } from "vitest";
|
||||||
|
|
||||||
|
import type { MsgContext } from "../auto-reply/templating.js";
|
||||||
|
import type { ClawdbotConfig } from "../config/config.js";
|
||||||
|
import { assertPublicHostname } from "../infra/net/ssrf.js";
|
||||||
|
import { runExec } from "../process/exec.js";
|
||||||
|
|
||||||
|
vi.mock("../process/exec.js", () => ({
|
||||||
|
runExec: vi.fn(),
|
||||||
|
}));
|
||||||
|
vi.mock("../infra/net/ssrf.js", async () => {
|
||||||
|
const actual =
|
||||||
|
await vi.importActual<typeof import("../infra/net/ssrf.js")>("../infra/net/ssrf.js");
|
||||||
|
return {
|
||||||
|
...actual,
|
||||||
|
assertPublicHostname: vi.fn(),
|
||||||
|
};
|
||||||
|
});
|
||||||
|
|
||||||
|
async function loadRunner() {
|
||||||
|
return await import("./runner.js");
|
||||||
|
}
|
||||||
|
|
||||||
|
describe("runLinkUnderstanding", () => {
|
||||||
|
const mockedRunExec = vi.mocked(runExec);
|
||||||
|
const mockedAssertPublicHostname = vi.mocked(assertPublicHostname);
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
mockedRunExec.mockReset();
|
||||||
|
mockedAssertPublicHostname.mockReset();
|
||||||
|
mockedAssertPublicHostname.mockResolvedValue(undefined);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("falls back to the next model when a CLI entry fails", async () => {
|
||||||
|
const { runLinkUnderstanding } = await loadRunner();
|
||||||
|
mockedRunExec.mockImplementation(async (command) => {
|
||||||
|
if (command === "fail") throw new Error("boom");
|
||||||
|
return { stdout: "summary", stderr: "" };
|
||||||
|
});
|
||||||
|
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
models: [{ command: "fail" }, { command: "ok" }],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "see https://example.com",
|
||||||
|
SessionKey: "session-1",
|
||||||
|
Surface: "discord",
|
||||||
|
ChatType: "direct",
|
||||||
|
};
|
||||||
|
|
||||||
|
const result = await runLinkUnderstanding({ cfg, ctx });
|
||||||
|
expect(result.urls).toEqual(["https://example.com"]);
|
||||||
|
expect(result.outputs).toEqual([{ url: "https://example.com", text: "summary", source: "ok" }]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("skips links that fail the public hostname check", async () => {
|
||||||
|
const { runLinkUnderstanding } = await loadRunner();
|
||||||
|
mockedAssertPublicHostname.mockRejectedValueOnce(new Error("Blocked: private/internal IP"));
|
||||||
|
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
models: [{ command: "ok" }],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "see https://intranet.example.com",
|
||||||
|
SessionKey: "session-1",
|
||||||
|
Surface: "discord",
|
||||||
|
ChatType: "direct",
|
||||||
|
};
|
||||||
|
|
||||||
|
const result = await runLinkUnderstanding({ cfg, ctx });
|
||||||
|
expect(result.outputs).toEqual([]);
|
||||||
|
expect(result.decisions[0]?.urls[0]?.attempts[0]?.reason).toContain("Blocked");
|
||||||
|
});
|
||||||
|
});
|
||||||
221
src/link-understanding/runner.ts
Normal file
221
src/link-understanding/runner.ts
Normal file
@ -0,0 +1,221 @@
|
|||||||
|
import type { ClawdbotConfig } from "../config/config.js";
|
||||||
|
import type { MsgContext } from "../auto-reply/templating.js";
|
||||||
|
import { applyTemplate } from "../auto-reply/templating.js";
|
||||||
|
import type { LinkModelConfig, LinkToolsConfig } from "../config/types.tools.js";
|
||||||
|
import { logVerbose, shouldLogVerbose } from "../globals.js";
|
||||||
|
import { runExec } from "../process/exec.js";
|
||||||
|
import { CLI_OUTPUT_MAX_BUFFER } from "../media-understanding/defaults.js";
|
||||||
|
import { resolveTimeoutMs } from "../media-understanding/resolve.js";
|
||||||
|
import {
|
||||||
|
normalizeMediaUnderstandingChatType,
|
||||||
|
resolveMediaUnderstandingScope,
|
||||||
|
} from "../media-understanding/scope.js";
|
||||||
|
import { assertPublicHostname } from "../infra/net/ssrf.js";
|
||||||
|
import { DEFAULT_LINK_TIMEOUT_SECONDS } from "./defaults.js";
|
||||||
|
import { extractLinksFromMessage } from "./detect.js";
|
||||||
|
import type {
|
||||||
|
LinkUnderstandingDecision,
|
||||||
|
LinkUnderstandingModelDecision,
|
||||||
|
LinkUnderstandingOutput,
|
||||||
|
LinkUnderstandingUrlDecision,
|
||||||
|
} from "./types.js";
|
||||||
|
|
||||||
|
export type LinkUnderstandingResult = {
|
||||||
|
urls: string[];
|
||||||
|
outputs: LinkUnderstandingOutput[];
|
||||||
|
decisions: LinkUnderstandingDecision[];
|
||||||
|
};
|
||||||
|
|
||||||
|
function resolveScopeDecision(params: {
|
||||||
|
config?: LinkToolsConfig;
|
||||||
|
ctx: MsgContext;
|
||||||
|
}): "allow" | "deny" {
|
||||||
|
return resolveMediaUnderstandingScope({
|
||||||
|
scope: params.config?.scope,
|
||||||
|
sessionKey: params.ctx.SessionKey,
|
||||||
|
channel: params.ctx.Surface ?? params.ctx.Provider,
|
||||||
|
chatType: normalizeMediaUnderstandingChatType(params.ctx.ChatType),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function resolveTimeoutMsFromConfig(params: {
|
||||||
|
config?: LinkToolsConfig;
|
||||||
|
entry: LinkModelConfig;
|
||||||
|
}): number {
|
||||||
|
const configured = params.entry.timeoutSeconds ?? params.config?.timeoutSeconds;
|
||||||
|
return resolveTimeoutMs(configured, DEFAULT_LINK_TIMEOUT_SECONDS);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runCliEntry(params: {
|
||||||
|
entry: LinkModelConfig;
|
||||||
|
ctx: MsgContext;
|
||||||
|
url: string;
|
||||||
|
config?: LinkToolsConfig;
|
||||||
|
}): Promise<string | null> {
|
||||||
|
if ((params.entry.type ?? "cli") !== "cli") return null;
|
||||||
|
const command = params.entry.command.trim();
|
||||||
|
if (!command) return null;
|
||||||
|
const args = params.entry.args ?? [];
|
||||||
|
const timeoutMs = resolveTimeoutMsFromConfig({ config: params.config, entry: params.entry });
|
||||||
|
const templCtx = {
|
||||||
|
...params.ctx,
|
||||||
|
LinkUrl: params.url,
|
||||||
|
};
|
||||||
|
const argv = [command, ...args].map((part, index) =>
|
||||||
|
index === 0 ? part : applyTemplate(part, templCtx),
|
||||||
|
);
|
||||||
|
|
||||||
|
if (shouldLogVerbose()) {
|
||||||
|
logVerbose(`Link understanding via CLI: ${argv.join(" ")}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const { stdout } = await runExec(argv[0], argv.slice(1), {
|
||||||
|
timeoutMs,
|
||||||
|
maxBuffer: CLI_OUTPUT_MAX_BUFFER,
|
||||||
|
});
|
||||||
|
const trimmed = stdout.trim();
|
||||||
|
return trimmed || null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildModelDecision(params: {
|
||||||
|
entry: LinkModelConfig;
|
||||||
|
outcome: LinkUnderstandingModelDecision["outcome"];
|
||||||
|
reason?: string;
|
||||||
|
}): LinkUnderstandingModelDecision {
|
||||||
|
const command = params.entry.command?.trim();
|
||||||
|
return {
|
||||||
|
type: "cli",
|
||||||
|
command: command || undefined,
|
||||||
|
outcome: params.outcome,
|
||||||
|
reason: params.reason,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function assertUrlIsPublic(
|
||||||
|
url: string,
|
||||||
|
): Promise<{ ok: true } | { ok: false; reason: string }> {
|
||||||
|
try {
|
||||||
|
const parsed = new URL(url);
|
||||||
|
await assertPublicHostname(parsed.hostname);
|
||||||
|
return { ok: true };
|
||||||
|
} catch (err) {
|
||||||
|
return { ok: false, reason: String(err) };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runLinkEntries(params: {
|
||||||
|
entries: LinkModelConfig[];
|
||||||
|
ctx: MsgContext;
|
||||||
|
url: string;
|
||||||
|
config?: LinkToolsConfig;
|
||||||
|
}): Promise<{
|
||||||
|
output: LinkUnderstandingOutput | null;
|
||||||
|
attempts: LinkUnderstandingModelDecision[];
|
||||||
|
}> {
|
||||||
|
const attempts: LinkUnderstandingModelDecision[] = [];
|
||||||
|
for (const entry of params.entries) {
|
||||||
|
try {
|
||||||
|
const output = await runCliEntry({
|
||||||
|
entry,
|
||||||
|
ctx: params.ctx,
|
||||||
|
url: params.url,
|
||||||
|
config: params.config,
|
||||||
|
});
|
||||||
|
if (output) {
|
||||||
|
const decision = buildModelDecision({ entry, outcome: "success" });
|
||||||
|
attempts.push(decision);
|
||||||
|
return {
|
||||||
|
output: {
|
||||||
|
url: params.url,
|
||||||
|
text: output,
|
||||||
|
source: entry.command?.trim() || undefined,
|
||||||
|
},
|
||||||
|
attempts,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
attempts.push(buildModelDecision({ entry, outcome: "skipped", reason: "empty output" }));
|
||||||
|
} catch (err) {
|
||||||
|
if (shouldLogVerbose()) {
|
||||||
|
logVerbose(`Link understanding failed for ${params.url}: ${String(err)}`);
|
||||||
|
}
|
||||||
|
attempts.push(
|
||||||
|
buildModelDecision({
|
||||||
|
entry,
|
||||||
|
outcome: "failed",
|
||||||
|
reason: String(err),
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return { output: null, attempts };
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runLinkUnderstanding(params: {
|
||||||
|
cfg: ClawdbotConfig;
|
||||||
|
ctx: MsgContext;
|
||||||
|
message?: string;
|
||||||
|
}): Promise<LinkUnderstandingResult> {
|
||||||
|
const config = params.cfg.tools?.links;
|
||||||
|
if (!config || config.enabled === false) {
|
||||||
|
return { urls: [], outputs: [], decisions: [{ outcome: "disabled", urls: [] }] };
|
||||||
|
}
|
||||||
|
|
||||||
|
const scopeDecision = resolveScopeDecision({ config, ctx: params.ctx });
|
||||||
|
if (scopeDecision === "deny") {
|
||||||
|
if (shouldLogVerbose()) {
|
||||||
|
logVerbose("Link understanding disabled by scope policy.");
|
||||||
|
}
|
||||||
|
return { urls: [], outputs: [], decisions: [{ outcome: "scope-deny", urls: [] }] };
|
||||||
|
}
|
||||||
|
|
||||||
|
const message = params.message ?? params.ctx.CommandBody ?? params.ctx.RawBody ?? params.ctx.Body;
|
||||||
|
const links = extractLinksFromMessage(message ?? "", { maxLinks: config?.maxLinks });
|
||||||
|
if (links.length === 0) {
|
||||||
|
return { urls: [], outputs: [], decisions: [{ outcome: "no-links", urls: [] }] };
|
||||||
|
}
|
||||||
|
|
||||||
|
const entries = config?.models ?? [];
|
||||||
|
if (entries.length === 0) {
|
||||||
|
const urlDecisions: LinkUnderstandingUrlDecision[] = links.map((url) => ({
|
||||||
|
url,
|
||||||
|
attempts: [],
|
||||||
|
}));
|
||||||
|
return {
|
||||||
|
urls: links,
|
||||||
|
outputs: [],
|
||||||
|
decisions: [{ outcome: "skipped", urls: urlDecisions }],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
const outputs: LinkUnderstandingOutput[] = [];
|
||||||
|
const urlDecisions: LinkUnderstandingUrlDecision[] = [];
|
||||||
|
for (const url of links) {
|
||||||
|
const ssrfCheck = await assertUrlIsPublic(url);
|
||||||
|
if (!ssrfCheck.ok) {
|
||||||
|
urlDecisions.push({
|
||||||
|
url,
|
||||||
|
attempts: [
|
||||||
|
{
|
||||||
|
type: "cli",
|
||||||
|
command: "ssrf",
|
||||||
|
outcome: "skipped",
|
||||||
|
reason: ssrfCheck.reason,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
});
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
const { output, attempts } = await runLinkEntries({
|
||||||
|
entries,
|
||||||
|
ctx: params.ctx,
|
||||||
|
url,
|
||||||
|
config,
|
||||||
|
});
|
||||||
|
const chosen = attempts.find((attempt) => attempt.outcome === "success");
|
||||||
|
urlDecisions.push({ url, attempts, chosen });
|
||||||
|
if (output) outputs.push(output);
|
||||||
|
}
|
||||||
|
|
||||||
|
const outcome = outputs.length > 0 ? "success" : "skipped";
|
||||||
|
return { urls: links, outputs, decisions: [{ outcome, urls: urlDecisions }] };
|
||||||
|
}
|
||||||
30
src/link-understanding/types.ts
Normal file
30
src/link-understanding/types.ts
Normal file
@ -0,0 +1,30 @@
|
|||||||
|
export type LinkUnderstandingOutput = {
|
||||||
|
url: string;
|
||||||
|
text: string;
|
||||||
|
source?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type LinkUnderstandingModelDecision = {
|
||||||
|
type: "cli";
|
||||||
|
command?: string;
|
||||||
|
outcome: "success" | "skipped" | "failed";
|
||||||
|
reason?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type LinkUnderstandingUrlDecision = {
|
||||||
|
url: string;
|
||||||
|
attempts: LinkUnderstandingModelDecision[];
|
||||||
|
chosen?: LinkUnderstandingModelDecision;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type LinkUnderstandingDecisionOutcome =
|
||||||
|
| "success"
|
||||||
|
| "skipped"
|
||||||
|
| "disabled"
|
||||||
|
| "scope-deny"
|
||||||
|
| "no-links";
|
||||||
|
|
||||||
|
export type LinkUnderstandingDecision = {
|
||||||
|
outcome: LinkUnderstandingDecisionOutcome;
|
||||||
|
urls: LinkUnderstandingUrlDecision[];
|
||||||
|
};
|
||||||
Loading…
Reference in New Issue
Block a user