feat: add link understanding pipeline (#1637) (thanks @dasilva333)
This commit is contained in:
parent
4a90b2f329
commit
6a0eafd191
@ -11,6 +11,7 @@ Docs: https://docs.clawd.bot
|
|||||||
- Docs: expand FAQ (migration, scheduling, concurrency, model recommendations, OpenAI subscription auth, Pi sizing, hackable install, docs SSL workaround).
|
- Docs: expand FAQ (migration, scheduling, concurrency, model recommendations, OpenAI subscription auth, Pi sizing, hackable install, docs SSL workaround).
|
||||||
- Docs: add verbose installer troubleshooting guidance.
|
- Docs: add verbose installer troubleshooting guidance.
|
||||||
- Docs: update Fly.io guide notes.
|
- Docs: update Fly.io guide notes.
|
||||||
|
- Tools: add link understanding with CLI preprocessing, SSRF guard, and structured link summaries. (#1637) Thanks @dasilva333.
|
||||||
- Docs: add Bedrock EC2 instance role setup + IAM steps. (#1625) Thanks @sergical. https://docs.clawd.bot/bedrock
|
- Docs: add Bedrock EC2 instance role setup + IAM steps. (#1625) Thanks @sergical. https://docs.clawd.bot/bedrock
|
||||||
- Exec approvals: forward approval prompts to chat with `/approve` for all channels (including plugins). (#1621) Thanks @czekaj. https://docs.clawd.bot/tools/exec-approvals https://docs.clawd.bot/tools/slash-commands
|
- Exec approvals: forward approval prompts to chat with `/approve` for all channels (including plugins). (#1621) Thanks @czekaj. https://docs.clawd.bot/tools/exec-approvals https://docs.clawd.bot/tools/slash-commands
|
||||||
- Gateway: expose config.patch in the gateway tool with safe partial updates + restart sentinel. (#1653) Thanks @Glucksberg.
|
- Gateway: expose config.patch in the gateway tool with safe partial updates + restart sentinel. (#1653) Thanks @Glucksberg.
|
||||||
|
|||||||
@ -1005,6 +1005,7 @@
|
|||||||
"plugins/zalouser",
|
"plugins/zalouser",
|
||||||
"tools/exec",
|
"tools/exec",
|
||||||
"tools/web",
|
"tools/web",
|
||||||
|
"tools/link-understanding",
|
||||||
"tools/apply-patch",
|
"tools/apply-patch",
|
||||||
"tools/elevated",
|
"tools/elevated",
|
||||||
"tools/browser",
|
"tools/browser",
|
||||||
|
|||||||
@ -2010,6 +2010,35 @@ Example:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
`tools.links` configures link understanding (URL preprocessing via CLI):
|
||||||
|
- `tools.links.enabled`: enable/disable link understanding (default: enabled when models are configured).
|
||||||
|
- `tools.links.maxLinks`: max URLs per message (default 3).
|
||||||
|
- `tools.links.timeoutSeconds`: default CLI timeout (default 30).
|
||||||
|
- `tools.links.scope`: optional gating rules (same shape as media understanding scope).
|
||||||
|
- `tools.links.models`: ordered CLI entries (fallbacks in order).
|
||||||
|
- `command`: executable to run (required).
|
||||||
|
- `args`: templated args (supports `{{LinkUrl}}`).
|
||||||
|
- `timeoutSeconds`: optional per-entry override.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
maxLinks: 2,
|
||||||
|
scope: {
|
||||||
|
default: "deny",
|
||||||
|
rules: [{ action: "allow", match: { chatType: "direct" } }]
|
||||||
|
},
|
||||||
|
models: [
|
||||||
|
{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
`agents.defaults.subagents` configures sub-agent defaults:
|
`agents.defaults.subagents` configures sub-agent defaults:
|
||||||
- `model`: default model for spawned sub-agents (string or `{ primary, fallbacks }`). If omitted, sub-agents inherit the caller’s model unless overridden per agent or per call.
|
- `model`: default model for spawned sub-agents (string or `{ primary, fallbacks }`). If omitted, sub-agents inherit the caller’s model unless overridden per agent or per call.
|
||||||
- `maxConcurrent`: max concurrent sub-agent runs (default 1)
|
- `maxConcurrent`: max concurrent sub-agent runs (default 1)
|
||||||
@ -3180,7 +3209,7 @@ clawdbot dns setup --apply
|
|||||||
|
|
||||||
## Template variables
|
## Template variables
|
||||||
|
|
||||||
Template placeholders are expanded in `tools.media.*.models[].args` and `tools.media.models[].args` (and any future templated argument fields).
|
Template placeholders are expanded in `tools.media.*.models[].args`, `tools.media.models[].args`, and `tools.links.models[].args` (and any future templated argument fields).
|
||||||
|
|
||||||
| Variable | Description |
|
| Variable | Description |
|
||||||
|----------|-------------|
|
|----------|-------------|
|
||||||
@ -3195,6 +3224,7 @@ Template placeholders are expanded in `tools.media.*.models[].args` and `tools.m
|
|||||||
| `{{MediaUrl}}` | Inbound media pseudo-URL (if present) |
|
| `{{MediaUrl}}` | Inbound media pseudo-URL (if present) |
|
||||||
| `{{MediaPath}}` | Local media path (if downloaded) |
|
| `{{MediaPath}}` | Local media path (if downloaded) |
|
||||||
| `{{MediaType}}` | Media type (image/audio/document/…) |
|
| `{{MediaType}}` | Media type (image/audio/document/…) |
|
||||||
|
| `{{LinkUrl}}` | URL currently being processed (link understanding only) |
|
||||||
| `{{Transcript}}` | Audio transcript (when enabled) |
|
| `{{Transcript}}` | Audio transcript (when enabled) |
|
||||||
| `{{Prompt}}` | Resolved media prompt for CLI entries |
|
| `{{Prompt}}` | Resolved media prompt for CLI entries |
|
||||||
| `{{MaxChars}}` | Resolved max output chars for CLI entries |
|
| `{{MaxChars}}` | Resolved max output chars for CLI entries |
|
||||||
|
|||||||
@ -232,6 +232,13 @@ Notes:
|
|||||||
- See [Web tools](/tools/web) for setup.
|
- See [Web tools](/tools/web) for setup.
|
||||||
- See [Firecrawl](/tools/firecrawl) for the optional anti-bot fallback.
|
- See [Firecrawl](/tools/firecrawl) for the optional anti-bot fallback.
|
||||||
|
|
||||||
|
### Link understanding (inbound)
|
||||||
|
Preprocess inbound messages by running a CLI on detected URLs and appending the
|
||||||
|
results to the message body before the agent sees it. This is **not** an agent
|
||||||
|
tool; it runs in the inbound auto-reply pipeline.
|
||||||
|
|
||||||
|
See [Link understanding](/tools/link-understanding) for setup and configuration.
|
||||||
|
|
||||||
### `browser`
|
### `browser`
|
||||||
Control the dedicated clawd browser.
|
Control the dedicated clawd browser.
|
||||||
|
|
||||||
|
|||||||
326
docs/tools/link-understanding.md
Normal file
326
docs/tools/link-understanding.md
Normal file
@ -0,0 +1,326 @@
|
|||||||
|
---
|
||||||
|
summary: "Run a CLI on URLs in inbound messages and inject results before the agent replies"
|
||||||
|
read_when:
|
||||||
|
- You want to summarize or enrich URLs in inbound messages
|
||||||
|
- You need to enable or scope link understanding
|
||||||
|
- You want to wire a custom CLI to preprocess links
|
||||||
|
---
|
||||||
|
|
||||||
|
# Link understanding
|
||||||
|
|
||||||
|
Link understanding is the **inbound URL enrichment pipeline**: Clawdbot detects
|
||||||
|
links in a message, runs your **local CLI** to interpret them, and injects the
|
||||||
|
results into the message body **before the agent sees it**. The goal is a
|
||||||
|
predictable, structured prompt envelope without adding a browsing tool.
|
||||||
|
|
||||||
|
This is **not** an agent tool. It is a preprocessing step in the auto-reply
|
||||||
|
pipeline, shared across all channels (including webchat).
|
||||||
|
|
||||||
|
## What it enables
|
||||||
|
|
||||||
|
- Summarize or extract data from links (docs, tickets, dashboards, runbooks).
|
||||||
|
- Inject structured context (titles, owners, status, timestamps, key facts).
|
||||||
|
- Normalize or transform link payloads (HTML → text, JSON → bullets).
|
||||||
|
- Gate link processing by channel, chat type, or session key.
|
||||||
|
- Apply SSRF protections before your CLI runs.
|
||||||
|
|
||||||
|
## When it runs (pipeline order)
|
||||||
|
|
||||||
|
Link understanding runs after inbound normalization and media understanding,
|
||||||
|
but before the agent executes the reply run:
|
||||||
|
|
||||||
|
1) Inbound message is normalized and routed.
|
||||||
|
2) Media understanding may rewrite the body (images/audio/video).
|
||||||
|
3) Link understanding detects URLs in the current command/body and appends results.
|
||||||
|
4) The agent receives the final body and command parsing uses the updated text.
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
1) Clawdbot scans the inbound message for **bare** `http/https` URLs.
|
||||||
|
2) It dedupes links in order and caps them at `tools.links.maxLinks` (default 3).
|
||||||
|
3) For each link, it runs your configured CLI entries **in order** until one returns non-empty stdout.
|
||||||
|
4) It appends the resulting outputs to the message body (structured envelope).
|
||||||
|
5) The agent sees the original text plus the appended link outputs.
|
||||||
|
|
||||||
|
No requests are made by Clawdbot itself; **your CLI does the fetching/parsing**.
|
||||||
|
|
||||||
|
## URL detection rules
|
||||||
|
|
||||||
|
- Only **bare URLs** are extracted (e.g. `https://example.com`).
|
||||||
|
- Markdown links are ignored (`[label](https://example.com)` is stripped).
|
||||||
|
- Links are deduped (first occurrence wins).
|
||||||
|
- Only `http` and `https` are allowed.
|
||||||
|
- Local and private hosts are blocked (see **Security** below).
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Link understanding is configured under `tools.links`:
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
maxLinks: 3,
|
||||||
|
timeoutSeconds: 30,
|
||||||
|
scope: {
|
||||||
|
default: "allow",
|
||||||
|
rules: [{ action: "deny", match: { chatType: "group" } }]
|
||||||
|
},
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
command: "link-understand",
|
||||||
|
args: ["--url", "{{LinkUrl}}", "--format", "markdown"],
|
||||||
|
timeoutSeconds: 20
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `tools.links` fields
|
||||||
|
|
||||||
|
- `enabled`: enable/disable link understanding (default: enabled when models are configured).
|
||||||
|
- `maxLinks`: max URLs processed per message (default 3).
|
||||||
|
- `timeoutSeconds`: default timeout for CLI runs (default 30).
|
||||||
|
- `scope`: optional gating rules (same structure as media understanding scope).
|
||||||
|
- `models`: ordered list of CLI entries (fallbacks in order).
|
||||||
|
|
||||||
|
### Model entries (`tools.links.models[]`)
|
||||||
|
|
||||||
|
Each entry is a CLI command:
|
||||||
|
|
||||||
|
- `type`: optional, only `"cli"` is supported (default).
|
||||||
|
- `command`: executable to run (required).
|
||||||
|
- `args`: CLI args (templated; see **Template variables**).
|
||||||
|
- `timeoutSeconds`: optional override for this entry.
|
||||||
|
|
||||||
|
The first entry that returns **non-empty stdout** wins. If a command fails or
|
||||||
|
returns empty output, Clawdbot tries the next entry.
|
||||||
|
|
||||||
|
## Performance + ordering
|
||||||
|
|
||||||
|
- Links are processed **sequentially** (per message, in order).
|
||||||
|
- Each CLI entry has a timeout (per entry or default).
|
||||||
|
- Failure to run a CLI does **not** abort the agent run; it just tries the next entry.
|
||||||
|
- `maxLinks` is the primary cap for latency. Keep it low for chatty channels.
|
||||||
|
|
||||||
|
## Template variables
|
||||||
|
|
||||||
|
`tools.links.models[].args` supports the standard template variables plus `{{LinkUrl}}`.
|
||||||
|
See [Template variables](/gateway/configuration#template-variables) for the full list.
|
||||||
|
|
||||||
|
Link-specific variable:
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `{{LinkUrl}}` | URL currently being processed |
|
||||||
|
|
||||||
|
## Output format
|
||||||
|
|
||||||
|
Each link is wrapped in a predictable envelope and appended to the message body:
|
||||||
|
|
||||||
|
```
|
||||||
|
<original message>
|
||||||
|
|
||||||
|
[Link]
|
||||||
|
URL: https://example.com
|
||||||
|
Source: link-understand
|
||||||
|
Summary:
|
||||||
|
<cli output>
|
||||||
|
```
|
||||||
|
|
||||||
|
When multiple links are present, the header is numbered (`[Link 1/2]`, `[Link 2/2]`).
|
||||||
|
`Source` is the CLI `command` that produced the output.
|
||||||
|
|
||||||
|
### Output best practices
|
||||||
|
|
||||||
|
- Keep summaries short and stable (avoid prompt bloat).
|
||||||
|
- Prefer a compact, consistent structure (title → bullets → status).
|
||||||
|
- Avoid surrounding fences unless you want the agent to treat it as literal text.
|
||||||
|
|
||||||
|
|
||||||
|
## Scope gating
|
||||||
|
|
||||||
|
`tools.links.scope` uses the same rules as media understanding:
|
||||||
|
|
||||||
|
- `default`: `"allow"` or `"deny"`.
|
||||||
|
- `rules[]`: first match wins.
|
||||||
|
- `match.channel` (surface/channel)
|
||||||
|
- `match.chatType` (`direct`, `group`, `channel`)
|
||||||
|
- `match.keyPrefix` (session key prefix)
|
||||||
|
|
||||||
|
Example (only allow direct messages):
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
scope: {
|
||||||
|
default: "deny",
|
||||||
|
rules: [{ action: "allow", match: { chatType: "direct" } }]
|
||||||
|
},
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security (SSRF guard)
|
||||||
|
|
||||||
|
Clawdbot blocks local/private hosts before invoking your CLI. The guard rejects:
|
||||||
|
|
||||||
|
- `localhost`, `*.localhost`, `*.local`, `*.internal`
|
||||||
|
- Loopback addresses (IPv4 + IPv6)
|
||||||
|
- Private and link-local IP ranges (IPv4 + IPv6)
|
||||||
|
- `metadata.google.internal`
|
||||||
|
- Addresses that resolve to private/internal IPs
|
||||||
|
|
||||||
|
If a URL is blocked, it is **skipped** and your CLI is not invoked for it.
|
||||||
|
URLs that fail DNS resolution are also skipped.
|
||||||
|
|
||||||
|
## Decision tracking
|
||||||
|
|
||||||
|
Each run records a decision payload (useful in logs/debugging) in `ctx.LinkUnderstandingDecisions`:
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
outcome: "success" | "skipped" | "disabled" | "scope-deny" | "no-links",
|
||||||
|
urls: [
|
||||||
|
{
|
||||||
|
url: "https://example.com",
|
||||||
|
attempts: [
|
||||||
|
{ type: "cli", command: "link-understand", outcome: "success" }
|
||||||
|
],
|
||||||
|
chosen: { type: "cli", command: "link-understand", outcome: "success" }
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Custom usage patterns
|
||||||
|
|
||||||
|
### Ticket summarizer (single link)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
maxLinks: 1,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
command: "ticket-summary",
|
||||||
|
args: ["--url", "{{LinkUrl}}", "--format", "brief"]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Per-channel allowlist (only Discord + Slack)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
scope: {
|
||||||
|
default: "deny",
|
||||||
|
rules: [
|
||||||
|
{ action: "allow", match: { channel: "discord" } },
|
||||||
|
{ action: "allow", match: { channel: "slack" } }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Per-agent override (support agent only)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
agents: {
|
||||||
|
list: [
|
||||||
|
{
|
||||||
|
id: "support",
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
maxLinks: 2,
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Markdown link behavior (ignored)
|
||||||
|
|
||||||
|
Message:
|
||||||
|
```
|
||||||
|
Please check [our docs](https://docs.example.com) and https://status.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
Only `https://status.example.com` is processed (markdown links are stripped).
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
- **Nothing happens:** ensure `tools.links.models` is set and `enabled` is not `false`.
|
||||||
|
- **No links detected:** only bare URLs are extracted; markdown links are ignored.
|
||||||
|
- **Output missing:** your CLI returned empty stdout; try logging or return a default line.
|
||||||
|
- **Wrong channel scope:** check `tools.links.scope` rules and `match.chatType`.
|
||||||
|
- **SSRF blocked:** local/private URLs are skipped by design.
|
||||||
|
|
||||||
|
Enable verbose logs to see CLI execution and scope decisions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clawdbot gateway run --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Minimal CLI runner
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Limit to 1 link, custom timeout
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
maxLinks: 1,
|
||||||
|
timeoutSeconds: 10,
|
||||||
|
models: [{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] }]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fallback chain (first non-empty output wins)
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
models: [
|
||||||
|
{ command: "link-understand", args: ["--url", "{{LinkUrl}}"] },
|
||||||
|
{ command: "link-backup", args: ["{{LinkUrl}}"] }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
@ -5,6 +5,7 @@ import type {
|
|||||||
MediaUnderstandingDecision,
|
MediaUnderstandingDecision,
|
||||||
MediaUnderstandingOutput,
|
MediaUnderstandingOutput,
|
||||||
} from "../media-understanding/types.js";
|
} from "../media-understanding/types.js";
|
||||||
|
import type { LinkUnderstandingDecision } from "../link-understanding/types.js";
|
||||||
|
|
||||||
/** Valid message channels for routing. */
|
/** Valid message channels for routing. */
|
||||||
export type OriginatingChannelType = ChannelId | InternalMessageChannel;
|
export type OriginatingChannelType = ChannelId | InternalMessageChannel;
|
||||||
@ -72,6 +73,7 @@ export type MsgContext = {
|
|||||||
MediaUnderstanding?: MediaUnderstandingOutput[];
|
MediaUnderstanding?: MediaUnderstandingOutput[];
|
||||||
MediaUnderstandingDecisions?: MediaUnderstandingDecision[];
|
MediaUnderstandingDecisions?: MediaUnderstandingDecision[];
|
||||||
LinkUnderstanding?: string[];
|
LinkUnderstanding?: string[];
|
||||||
|
LinkUnderstandingDecisions?: LinkUnderstandingDecision[];
|
||||||
Prompt?: string;
|
Prompt?: string;
|
||||||
MaxChars?: number;
|
MaxChars?: number;
|
||||||
ChatType?: string;
|
ChatType?: string;
|
||||||
|
|||||||
@ -240,6 +240,17 @@ export function resolveIMessageGroupToolPolicy(
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export function resolveGoogleChatGroupToolPolicy(
|
||||||
|
params: GroupMentionParams,
|
||||||
|
): GroupToolPolicyConfig | undefined {
|
||||||
|
return resolveChannelGroupToolsPolicy({
|
||||||
|
cfg: params.cfg,
|
||||||
|
channel: "googlechat",
|
||||||
|
groupId: params.groupId,
|
||||||
|
accountId: params.accountId,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
export function resolveDiscordGroupToolPolicy(
|
export function resolveDiscordGroupToolPolicy(
|
||||||
params: GroupMentionParams,
|
params: GroupMentionParams,
|
||||||
): GroupToolPolicyConfig | undefined {
|
): GroupToolPolicyConfig | undefined {
|
||||||
|
|||||||
@ -14,8 +14,6 @@ import {
|
|||||||
type ResponsePrefixContext,
|
type ResponsePrefixContext,
|
||||||
} from "../../auto-reply/reply/response-prefix-template.js";
|
} from "../../auto-reply/reply/response-prefix-template.js";
|
||||||
import type { MsgContext } from "../../auto-reply/templating.js";
|
import type { MsgContext } from "../../auto-reply/templating.js";
|
||||||
import { formatLinkUnderstandingBody } from "../../link-understanding/format.js";
|
|
||||||
import { runLinkUnderstanding } from "../../link-understanding/runner.js";
|
|
||||||
import { resolveSendPolicy } from "../../sessions/send-policy.js";
|
import { resolveSendPolicy } from "../../sessions/send-policy.js";
|
||||||
import { INTERNAL_MESSAGE_CHANNEL } from "../../utils/message-channel.js";
|
import { INTERNAL_MESSAGE_CHANNEL } from "../../utils/message-channel.js";
|
||||||
import {
|
import {
|
||||||
@ -414,31 +412,6 @@ export const chatHandlers: GatewayRequestHandlers = {
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
|
||||||
const linkCtx: MsgContext = {
|
|
||||||
Body: parsedMessage,
|
|
||||||
RawBody: parsedMessage,
|
|
||||||
CommandBody: parsedMessage,
|
|
||||||
SessionKey: p.sessionKey,
|
|
||||||
ChatType: entry?.chatType,
|
|
||||||
Surface: entry?.channel ?? "webchat",
|
|
||||||
};
|
|
||||||
const linkResult = await runLinkUnderstanding({
|
|
||||||
cfg,
|
|
||||||
ctx: linkCtx,
|
|
||||||
message: parsedMessage,
|
|
||||||
});
|
|
||||||
if (linkResult.outputs.length > 0) {
|
|
||||||
parsedMessage = formatLinkUnderstandingBody({
|
|
||||||
body: parsedMessage,
|
|
||||||
outputs: linkResult.outputs,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
} catch (err) {
|
|
||||||
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, String(err)));
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const abortController = new AbortController();
|
const abortController = new AbortController();
|
||||||
context.chatAbortControllers.set(clientRunId, {
|
context.chatAbortControllers.set(clientRunId, {
|
||||||
|
|||||||
61
src/link-understanding/apply.test.ts
Normal file
61
src/link-understanding/apply.test.ts
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
import { describe, expect, it, vi } from "vitest";
|
||||||
|
|
||||||
|
import type { ClawdbotConfig } from "../config/config.js";
|
||||||
|
import type { MsgContext } from "../auto-reply/templating.js";
|
||||||
|
import { runLinkUnderstanding } from "./runner.js";
|
||||||
|
|
||||||
|
vi.mock("./runner.js", () => ({
|
||||||
|
runLinkUnderstanding: vi.fn(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
describe("applyLinkUnderstanding", () => {
|
||||||
|
it("keeps command parsing bodies unchanged when link output is applied", async () => {
|
||||||
|
const mockedRunLinkUnderstanding = vi.mocked(runLinkUnderstanding);
|
||||||
|
mockedRunLinkUnderstanding.mockResolvedValue({
|
||||||
|
urls: ["https://example.com"],
|
||||||
|
outputs: [{ url: "https://example.com", text: "Summary", source: "link-cli" }],
|
||||||
|
decisions: [],
|
||||||
|
});
|
||||||
|
|
||||||
|
const { applyLinkUnderstanding } = await import("./apply.js");
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "check https://example.com",
|
||||||
|
RawBody: "raw override",
|
||||||
|
CommandBody: "/think low check https://example.com",
|
||||||
|
};
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: { links: { models: [{ command: "link-cli" }] } },
|
||||||
|
};
|
||||||
|
|
||||||
|
await applyLinkUnderstanding({ ctx, cfg });
|
||||||
|
|
||||||
|
expect(ctx.Body).toContain("[Link]");
|
||||||
|
expect(ctx.CommandBody).toBe("/think low check https://example.com");
|
||||||
|
expect(ctx.RawBody).toBe("raw override");
|
||||||
|
expect(ctx.BodyForCommands).toBe("/think low check https://example.com");
|
||||||
|
expect(ctx.BodyForCommands).not.toContain("[Link]");
|
||||||
|
});
|
||||||
|
|
||||||
|
it("preserves original body for command parsing when no overrides exist", async () => {
|
||||||
|
const mockedRunLinkUnderstanding = vi.mocked(runLinkUnderstanding);
|
||||||
|
mockedRunLinkUnderstanding.mockResolvedValue({
|
||||||
|
urls: ["https://example.com"],
|
||||||
|
outputs: [{ url: "https://example.com", text: "Summary", source: "link-cli" }],
|
||||||
|
decisions: [],
|
||||||
|
});
|
||||||
|
|
||||||
|
const { applyLinkUnderstanding } = await import("./apply.js");
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "check https://example.com",
|
||||||
|
};
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: { links: { models: [{ command: "link-cli" }] } },
|
||||||
|
};
|
||||||
|
|
||||||
|
await applyLinkUnderstanding({ ctx, cfg });
|
||||||
|
|
||||||
|
expect(ctx.Body).toContain("[Link]");
|
||||||
|
expect(ctx.RawBody).toBe("check https://example.com");
|
||||||
|
expect(ctx.BodyForCommands).toBe("check https://example.com");
|
||||||
|
});
|
||||||
|
});
|
||||||
@ -1,12 +1,14 @@
|
|||||||
import type { ClawdbotConfig } from "../config/config.js";
|
import type { ClawdbotConfig } from "../config/config.js";
|
||||||
import type { MsgContext } from "../auto-reply/templating.js";
|
import type { MsgContext } from "../auto-reply/templating.js";
|
||||||
import { finalizeInboundContext } from "../auto-reply/reply/inbound-context.js";
|
import { finalizeInboundContext } from "../auto-reply/reply/inbound-context.js";
|
||||||
import { formatLinkUnderstandingBody } from "./format.js";
|
import { formatLinkUnderstandingBody, formatLinkUnderstandingSections } from "./format.js";
|
||||||
import { runLinkUnderstanding } from "./runner.js";
|
import { runLinkUnderstanding } from "./runner.js";
|
||||||
|
import type { LinkUnderstandingDecision, LinkUnderstandingOutput } from "./types.js";
|
||||||
|
|
||||||
export type ApplyLinkUnderstandingResult = {
|
export type ApplyLinkUnderstandingResult = {
|
||||||
outputs: string[];
|
outputs: LinkUnderstandingOutput[];
|
||||||
urls: string[];
|
urls: string[];
|
||||||
|
decisions: LinkUnderstandingDecision[];
|
||||||
};
|
};
|
||||||
|
|
||||||
export async function applyLinkUnderstanding(params: {
|
export async function applyLinkUnderstanding(params: {
|
||||||
@ -18,16 +20,29 @@ export async function applyLinkUnderstanding(params: {
|
|||||||
ctx: params.ctx,
|
ctx: params.ctx,
|
||||||
});
|
});
|
||||||
|
|
||||||
|
if (result.decisions.length > 0) {
|
||||||
|
params.ctx.LinkUnderstandingDecisions = [
|
||||||
|
...(params.ctx.LinkUnderstandingDecisions ?? []),
|
||||||
|
...result.decisions,
|
||||||
|
];
|
||||||
|
}
|
||||||
|
|
||||||
if (result.outputs.length === 0) {
|
if (result.outputs.length === 0) {
|
||||||
return result;
|
return result;
|
||||||
}
|
}
|
||||||
|
|
||||||
params.ctx.LinkUnderstanding = [...(params.ctx.LinkUnderstanding ?? []), ...result.outputs];
|
const originalBody = params.ctx.Body;
|
||||||
|
const sections = formatLinkUnderstandingSections(result.outputs);
|
||||||
|
if (sections.length > 0) {
|
||||||
|
params.ctx.LinkUnderstanding = [...(params.ctx.LinkUnderstanding ?? []), ...sections];
|
||||||
|
}
|
||||||
params.ctx.Body = formatLinkUnderstandingBody({
|
params.ctx.Body = formatLinkUnderstandingBody({
|
||||||
body: params.ctx.Body,
|
body: params.ctx.Body,
|
||||||
outputs: result.outputs,
|
outputs: result.outputs,
|
||||||
});
|
});
|
||||||
|
if (!params.ctx.CommandBody && !params.ctx.RawBody && typeof originalBody === "string") {
|
||||||
|
params.ctx.RawBody = originalBody;
|
||||||
|
}
|
||||||
finalizeInboundContext(params.ctx, {
|
finalizeInboundContext(params.ctx, {
|
||||||
forceBodyForAgent: true,
|
forceBodyForAgent: true,
|
||||||
forceBodyForCommands: true,
|
forceBodyForCommands: true,
|
||||||
|
|||||||
@ -20,8 +20,10 @@ describe("extractLinksFromMessage", () => {
|
|||||||
expect(links).toEqual(["https://bare.example"]);
|
expect(links).toEqual(["https://bare.example"]);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("blocks localhost", () => {
|
it("blocks loopback and private hosts", () => {
|
||||||
const links = extractLinksFromMessage("http://127.0.0.1/test https://ok.test");
|
const links = extractLinksFromMessage(
|
||||||
|
"http://localhost/test http://127.0.0.1/test http://[::1]/test http://10.0.0.1/test https://ok.test",
|
||||||
|
);
|
||||||
expect(links).toEqual(["https://ok.test"]);
|
expect(links).toEqual(["https://ok.test"]);
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@ -1,3 +1,4 @@
|
|||||||
|
import { isBlockedHostname, isPrivateIpAddress } from "../infra/net/ssrf.js";
|
||||||
import { DEFAULT_MAX_LINKS } from "./defaults.js";
|
import { DEFAULT_MAX_LINKS } from "./defaults.js";
|
||||||
|
|
||||||
// Remove markdown link syntax so only bare URLs are considered.
|
// Remove markdown link syntax so only bare URLs are considered.
|
||||||
@ -19,7 +20,10 @@ function isAllowedUrl(raw: string): boolean {
|
|||||||
try {
|
try {
|
||||||
const parsed = new URL(raw);
|
const parsed = new URL(raw);
|
||||||
if (parsed.protocol !== "http:" && parsed.protocol !== "https:") return false;
|
if (parsed.protocol !== "http:" && parsed.protocol !== "https:") return false;
|
||||||
if (parsed.hostname === "127.0.0.1") return false;
|
const hostname = parsed.hostname;
|
||||||
|
if (!hostname) return false;
|
||||||
|
if (isBlockedHostname(hostname)) return false;
|
||||||
|
if (isPrivateIpAddress(hostname)) return false;
|
||||||
return true;
|
return true;
|
||||||
} catch {
|
} catch {
|
||||||
return false;
|
return false;
|
||||||
|
|||||||
27
src/link-understanding/format.test.ts
Normal file
27
src/link-understanding/format.test.ts
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
import { describe, expect, it } from "vitest";
|
||||||
|
|
||||||
|
import { formatLinkUnderstandingBody } from "./format.js";
|
||||||
|
|
||||||
|
describe("formatLinkUnderstandingBody", () => {
|
||||||
|
it("appends a structured link summary with source", () => {
|
||||||
|
const body = formatLinkUnderstandingBody({
|
||||||
|
body: "Check this",
|
||||||
|
outputs: [{ url: "https://example.com", text: "Summary here", source: "link-cli" }],
|
||||||
|
});
|
||||||
|
expect(body).toBe(
|
||||||
|
"Check this\n\n[Link]\nURL: https://example.com\nSource: link-cli\nSummary:\nSummary here",
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("numbers multiple links and omits empty body", () => {
|
||||||
|
const body = formatLinkUnderstandingBody({
|
||||||
|
outputs: [
|
||||||
|
{ url: "https://a.example", text: "First", source: "cli-a" },
|
||||||
|
{ url: "https://b.example", text: "Second", source: "cli-b" },
|
||||||
|
],
|
||||||
|
});
|
||||||
|
expect(body).toBe(
|
||||||
|
"[Link 1/2]\nURL: https://a.example\nSource: cli-a\nSummary:\nFirst\n\n[Link 2/2]\nURL: https://b.example\nSource: cli-b\nSummary:\nSecond",
|
||||||
|
);
|
||||||
|
});
|
||||||
|
});
|
||||||
@ -1,10 +1,31 @@
|
|||||||
export function formatLinkUnderstandingBody(params: { body?: string; outputs: string[] }): string {
|
import type { LinkUnderstandingOutput } from "./types.js";
|
||||||
const outputs = params.outputs.map((output) => output.trim()).filter(Boolean);
|
|
||||||
if (outputs.length === 0) {
|
function formatSection(output: LinkUnderstandingOutput, index: number, total: number): string {
|
||||||
|
const label = total > 1 ? `Link ${index + 1}/${total}` : "Link";
|
||||||
|
const source = output.source?.trim() || "unknown";
|
||||||
|
const lines = [`[${label}]`, `URL: ${output.url}`, `Source: ${source}`];
|
||||||
|
lines.push(`Summary:\n${output.text.trim()}`);
|
||||||
|
return lines.join("\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatLinkUnderstandingSections(outputs: LinkUnderstandingOutput[]): string[] {
|
||||||
|
const trimmed = outputs
|
||||||
|
.map((output) => ({ ...output, text: output.text.trim(), url: output.url.trim() }))
|
||||||
|
.filter((output) => output.text && output.url);
|
||||||
|
if (trimmed.length === 0) return [];
|
||||||
|
return trimmed.map((output, index) => formatSection(output, index, trimmed.length));
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatLinkUnderstandingBody(params: {
|
||||||
|
body?: string;
|
||||||
|
outputs: LinkUnderstandingOutput[];
|
||||||
|
}): string {
|
||||||
|
const sections = formatLinkUnderstandingSections(params.outputs);
|
||||||
|
if (sections.length === 0) {
|
||||||
return params.body ?? "";
|
return params.body ?? "";
|
||||||
}
|
}
|
||||||
|
|
||||||
const base = (params.body ?? "").trim();
|
const base = (params.body ?? "").trim();
|
||||||
if (!base) return outputs.join("\n");
|
if (!base) return sections.join("\n\n");
|
||||||
return `${base}\n\n${outputs.join("\n")}`;
|
return `${base}\n\n${sections.join("\n\n")}`;
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,3 +2,10 @@ export { applyLinkUnderstanding } from "./apply.js";
|
|||||||
export { extractLinksFromMessage } from "./detect.js";
|
export { extractLinksFromMessage } from "./detect.js";
|
||||||
export { formatLinkUnderstandingBody } from "./format.js";
|
export { formatLinkUnderstandingBody } from "./format.js";
|
||||||
export { runLinkUnderstanding } from "./runner.js";
|
export { runLinkUnderstanding } from "./runner.js";
|
||||||
|
export type {
|
||||||
|
LinkUnderstandingDecision,
|
||||||
|
LinkUnderstandingDecisionOutcome,
|
||||||
|
LinkUnderstandingModelDecision,
|
||||||
|
LinkUnderstandingOutput,
|
||||||
|
LinkUnderstandingUrlDecision,
|
||||||
|
} from "./types.js";
|
||||||
|
|||||||
84
src/link-understanding/runner.test.ts
Normal file
84
src/link-understanding/runner.test.ts
Normal file
@ -0,0 +1,84 @@
|
|||||||
|
import { beforeEach, describe, expect, it, vi } from "vitest";
|
||||||
|
|
||||||
|
import type { MsgContext } from "../auto-reply/templating.js";
|
||||||
|
import type { ClawdbotConfig } from "../config/config.js";
|
||||||
|
import { assertPublicHostname } from "../infra/net/ssrf.js";
|
||||||
|
import { runExec } from "../process/exec.js";
|
||||||
|
|
||||||
|
vi.mock("../process/exec.js", () => ({
|
||||||
|
runExec: vi.fn(),
|
||||||
|
}));
|
||||||
|
vi.mock("../infra/net/ssrf.js", async () => {
|
||||||
|
const actual =
|
||||||
|
await vi.importActual<typeof import("../infra/net/ssrf.js")>("../infra/net/ssrf.js");
|
||||||
|
return {
|
||||||
|
...actual,
|
||||||
|
assertPublicHostname: vi.fn(),
|
||||||
|
};
|
||||||
|
});
|
||||||
|
|
||||||
|
async function loadRunner() {
|
||||||
|
return await import("./runner.js");
|
||||||
|
}
|
||||||
|
|
||||||
|
describe("runLinkUnderstanding", () => {
|
||||||
|
const mockedRunExec = vi.mocked(runExec);
|
||||||
|
const mockedAssertPublicHostname = vi.mocked(assertPublicHostname);
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
mockedRunExec.mockReset();
|
||||||
|
mockedAssertPublicHostname.mockReset();
|
||||||
|
mockedAssertPublicHostname.mockResolvedValue(undefined);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("falls back to the next model when a CLI entry fails", async () => {
|
||||||
|
const { runLinkUnderstanding } = await loadRunner();
|
||||||
|
mockedRunExec.mockImplementation(async (command) => {
|
||||||
|
if (command === "fail") throw new Error("boom");
|
||||||
|
return { stdout: "summary", stderr: "" };
|
||||||
|
});
|
||||||
|
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
models: [{ command: "fail" }, { command: "ok" }],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "see https://example.com",
|
||||||
|
SessionKey: "session-1",
|
||||||
|
Surface: "discord",
|
||||||
|
ChatType: "direct",
|
||||||
|
};
|
||||||
|
|
||||||
|
const result = await runLinkUnderstanding({ cfg, ctx });
|
||||||
|
expect(result.urls).toEqual(["https://example.com"]);
|
||||||
|
expect(result.outputs).toEqual([{ url: "https://example.com", text: "summary", source: "ok" }]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("skips links that fail the public hostname check", async () => {
|
||||||
|
const { runLinkUnderstanding } = await loadRunner();
|
||||||
|
mockedAssertPublicHostname.mockRejectedValueOnce(new Error("Blocked: private/internal IP"));
|
||||||
|
|
||||||
|
const cfg: ClawdbotConfig = {
|
||||||
|
tools: {
|
||||||
|
links: {
|
||||||
|
enabled: true,
|
||||||
|
models: [{ command: "ok" }],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
const ctx: MsgContext = {
|
||||||
|
Body: "see https://intranet.example.com",
|
||||||
|
SessionKey: "session-1",
|
||||||
|
Surface: "discord",
|
||||||
|
ChatType: "direct",
|
||||||
|
};
|
||||||
|
|
||||||
|
const result = await runLinkUnderstanding({ cfg, ctx });
|
||||||
|
expect(result.outputs).toEqual([]);
|
||||||
|
expect(result.decisions[0]?.urls[0]?.attempts[0]?.reason).toContain("Blocked");
|
||||||
|
});
|
||||||
|
});
|
||||||
@ -10,12 +10,20 @@ import {
|
|||||||
normalizeMediaUnderstandingChatType,
|
normalizeMediaUnderstandingChatType,
|
||||||
resolveMediaUnderstandingScope,
|
resolveMediaUnderstandingScope,
|
||||||
} from "../media-understanding/scope.js";
|
} from "../media-understanding/scope.js";
|
||||||
|
import { assertPublicHostname } from "../infra/net/ssrf.js";
|
||||||
import { DEFAULT_LINK_TIMEOUT_SECONDS } from "./defaults.js";
|
import { DEFAULT_LINK_TIMEOUT_SECONDS } from "./defaults.js";
|
||||||
import { extractLinksFromMessage } from "./detect.js";
|
import { extractLinksFromMessage } from "./detect.js";
|
||||||
|
import type {
|
||||||
|
LinkUnderstandingDecision,
|
||||||
|
LinkUnderstandingModelDecision,
|
||||||
|
LinkUnderstandingOutput,
|
||||||
|
LinkUnderstandingUrlDecision,
|
||||||
|
} from "./types.js";
|
||||||
|
|
||||||
export type LinkUnderstandingResult = {
|
export type LinkUnderstandingResult = {
|
||||||
urls: string[];
|
urls: string[];
|
||||||
outputs: string[];
|
outputs: LinkUnderstandingOutput[];
|
||||||
|
decisions: LinkUnderstandingDecision[];
|
||||||
};
|
};
|
||||||
|
|
||||||
function resolveScopeDecision(params: {
|
function resolveScopeDecision(params: {
|
||||||
@ -69,12 +77,42 @@ async function runCliEntry(params: {
|
|||||||
return trimmed || null;
|
return trimmed || null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function buildModelDecision(params: {
|
||||||
|
entry: LinkModelConfig;
|
||||||
|
outcome: LinkUnderstandingModelDecision["outcome"];
|
||||||
|
reason?: string;
|
||||||
|
}): LinkUnderstandingModelDecision {
|
||||||
|
const command = params.entry.command?.trim();
|
||||||
|
return {
|
||||||
|
type: "cli",
|
||||||
|
command: command || undefined,
|
||||||
|
outcome: params.outcome,
|
||||||
|
reason: params.reason,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function assertUrlIsPublic(
|
||||||
|
url: string,
|
||||||
|
): Promise<{ ok: true } | { ok: false; reason: string }> {
|
||||||
|
try {
|
||||||
|
const parsed = new URL(url);
|
||||||
|
await assertPublicHostname(parsed.hostname);
|
||||||
|
return { ok: true };
|
||||||
|
} catch (err) {
|
||||||
|
return { ok: false, reason: String(err) };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
async function runLinkEntries(params: {
|
async function runLinkEntries(params: {
|
||||||
entries: LinkModelConfig[];
|
entries: LinkModelConfig[];
|
||||||
ctx: MsgContext;
|
ctx: MsgContext;
|
||||||
url: string;
|
url: string;
|
||||||
config?: LinkToolsConfig;
|
config?: LinkToolsConfig;
|
||||||
}): Promise<string | null> {
|
}): Promise<{
|
||||||
|
output: LinkUnderstandingOutput | null;
|
||||||
|
attempts: LinkUnderstandingModelDecision[];
|
||||||
|
}> {
|
||||||
|
const attempts: LinkUnderstandingModelDecision[] = [];
|
||||||
for (const entry of params.entries) {
|
for (const entry of params.entries) {
|
||||||
try {
|
try {
|
||||||
const output = await runCliEntry({
|
const output = await runCliEntry({
|
||||||
@ -83,12 +121,33 @@ async function runLinkEntries(params: {
|
|||||||
url: params.url,
|
url: params.url,
|
||||||
config: params.config,
|
config: params.config,
|
||||||
});
|
});
|
||||||
if (output) return output;
|
if (output) {
|
||||||
|
const decision = buildModelDecision({ entry, outcome: "success" });
|
||||||
|
attempts.push(decision);
|
||||||
|
return {
|
||||||
|
output: {
|
||||||
|
url: params.url,
|
||||||
|
text: output,
|
||||||
|
source: entry.command?.trim() || undefined,
|
||||||
|
},
|
||||||
|
attempts,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
attempts.push(buildModelDecision({ entry, outcome: "skipped", reason: "empty output" }));
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
throw new Error(`link processor failed for ${params.url}: ${String(err)}`);
|
if (shouldLogVerbose()) {
|
||||||
|
logVerbose(`Link understanding failed for ${params.url}: ${String(err)}`);
|
||||||
|
}
|
||||||
|
attempts.push(
|
||||||
|
buildModelDecision({
|
||||||
|
entry,
|
||||||
|
outcome: "failed",
|
||||||
|
reason: String(err),
|
||||||
|
}),
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return null;
|
return { output: null, attempts };
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function runLinkUnderstanding(params: {
|
export async function runLinkUnderstanding(params: {
|
||||||
@ -97,33 +156,66 @@ export async function runLinkUnderstanding(params: {
|
|||||||
message?: string;
|
message?: string;
|
||||||
}): Promise<LinkUnderstandingResult> {
|
}): Promise<LinkUnderstandingResult> {
|
||||||
const config = params.cfg.tools?.links;
|
const config = params.cfg.tools?.links;
|
||||||
if (!config || config.enabled === false) return { urls: [], outputs: [] };
|
if (!config || config.enabled === false) {
|
||||||
|
return { urls: [], outputs: [], decisions: [{ outcome: "disabled", urls: [] }] };
|
||||||
|
}
|
||||||
|
|
||||||
const scopeDecision = resolveScopeDecision({ config, ctx: params.ctx });
|
const scopeDecision = resolveScopeDecision({ config, ctx: params.ctx });
|
||||||
if (scopeDecision === "deny") {
|
if (scopeDecision === "deny") {
|
||||||
if (shouldLogVerbose()) {
|
if (shouldLogVerbose()) {
|
||||||
logVerbose("Link understanding disabled by scope policy.");
|
logVerbose("Link understanding disabled by scope policy.");
|
||||||
}
|
}
|
||||||
return { urls: [], outputs: [] };
|
return { urls: [], outputs: [], decisions: [{ outcome: "scope-deny", urls: [] }] };
|
||||||
}
|
}
|
||||||
|
|
||||||
const message = params.message ?? params.ctx.CommandBody ?? params.ctx.RawBody ?? params.ctx.Body;
|
const message = params.message ?? params.ctx.CommandBody ?? params.ctx.RawBody ?? params.ctx.Body;
|
||||||
const links = extractLinksFromMessage(message ?? "", { maxLinks: config?.maxLinks });
|
const links = extractLinksFromMessage(message ?? "", { maxLinks: config?.maxLinks });
|
||||||
if (links.length === 0) return { urls: [], outputs: [] };
|
if (links.length === 0) {
|
||||||
|
return { urls: [], outputs: [], decisions: [{ outcome: "no-links", urls: [] }] };
|
||||||
|
}
|
||||||
|
|
||||||
const entries = config?.models ?? [];
|
const entries = config?.models ?? [];
|
||||||
if (entries.length === 0) return { urls: links, outputs: [] };
|
if (entries.length === 0) {
|
||||||
|
const urlDecisions: LinkUnderstandingUrlDecision[] = links.map((url) => ({
|
||||||
|
url,
|
||||||
|
attempts: [],
|
||||||
|
}));
|
||||||
|
return {
|
||||||
|
urls: links,
|
||||||
|
outputs: [],
|
||||||
|
decisions: [{ outcome: "skipped", urls: urlDecisions }],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
const outputs: string[] = [];
|
const outputs: LinkUnderstandingOutput[] = [];
|
||||||
|
const urlDecisions: LinkUnderstandingUrlDecision[] = [];
|
||||||
for (const url of links) {
|
for (const url of links) {
|
||||||
const output = await runLinkEntries({
|
const ssrfCheck = await assertUrlIsPublic(url);
|
||||||
|
if (!ssrfCheck.ok) {
|
||||||
|
urlDecisions.push({
|
||||||
|
url,
|
||||||
|
attempts: [
|
||||||
|
{
|
||||||
|
type: "cli",
|
||||||
|
command: "ssrf",
|
||||||
|
outcome: "skipped",
|
||||||
|
reason: ssrfCheck.reason,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
});
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
const { output, attempts } = await runLinkEntries({
|
||||||
entries,
|
entries,
|
||||||
ctx: params.ctx,
|
ctx: params.ctx,
|
||||||
url,
|
url,
|
||||||
config,
|
config,
|
||||||
});
|
});
|
||||||
|
const chosen = attempts.find((attempt) => attempt.outcome === "success");
|
||||||
|
urlDecisions.push({ url, attempts, chosen });
|
||||||
if (output) outputs.push(output);
|
if (output) outputs.push(output);
|
||||||
}
|
}
|
||||||
|
|
||||||
return { urls: links, outputs };
|
const outcome = outputs.length > 0 ? "success" : "skipped";
|
||||||
|
return { urls: links, outputs, decisions: [{ outcome, urls: urlDecisions }] };
|
||||||
}
|
}
|
||||||
|
|||||||
30
src/link-understanding/types.ts
Normal file
30
src/link-understanding/types.ts
Normal file
@ -0,0 +1,30 @@
|
|||||||
|
export type LinkUnderstandingOutput = {
|
||||||
|
url: string;
|
||||||
|
text: string;
|
||||||
|
source?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type LinkUnderstandingModelDecision = {
|
||||||
|
type: "cli";
|
||||||
|
command?: string;
|
||||||
|
outcome: "success" | "skipped" | "failed";
|
||||||
|
reason?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type LinkUnderstandingUrlDecision = {
|
||||||
|
url: string;
|
||||||
|
attempts: LinkUnderstandingModelDecision[];
|
||||||
|
chosen?: LinkUnderstandingModelDecision;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type LinkUnderstandingDecisionOutcome =
|
||||||
|
| "success"
|
||||||
|
| "skipped"
|
||||||
|
| "disabled"
|
||||||
|
| "scope-deny"
|
||||||
|
| "no-links";
|
||||||
|
|
||||||
|
export type LinkUnderstandingDecision = {
|
||||||
|
outcome: LinkUnderstandingDecisionOutcome;
|
||||||
|
urls: LinkUnderstandingUrlDecision[];
|
||||||
|
};
|
||||||
Loading…
Reference in New Issue
Block a user