feat(tools): add Jina Reader as web_fetch fallback provider
Add Jina Reader (https://jina.ai/reader/) as a native fallback provider for web_fetch, similar to how Firecrawl is integrated. Jina provides high-quality content extraction with: - PDF support (native text extraction) - Image captioning (via vision language models) - JavaScript rendering (browser engine option) - Token-based pricing (10M free tokens, more affordable than Firecrawl) - Markdown-optimised output for LLM consumption Changes: - Add ToolsWebFetchJinaSchema to zod-schema.agent-runtime.ts - Add fetchJinaContent() and tryJinaFallback() to web-fetch.ts - Update fallback chain: Readability -> Jina -> Firecrawl -> error - Add UI hints for Jina config options in schema.ts - Add docs/tools/jina.md documentation - Update docs/tools/web.md to reference Jina Configuration example: tools.web.fetch.jina.apiKey or JINA_API_KEY env var Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
699784dbee
commit
2f22e1a88b
108
docs/tools/jina.md
Normal file
108
docs/tools/jina.md
Normal file
@ -0,0 +1,108 @@
|
||||
---
|
||||
summary: "Jina Reader fallback for web_fetch (PDF support + browser rendering)"
|
||||
read_when:
|
||||
- You want Jina-backed web extraction
|
||||
- You need a Jina API key
|
||||
- You want PDF extraction for web_fetch
|
||||
- You want browser-rendered extraction for JS-heavy sites
|
||||
---
|
||||
|
||||
# Jina Reader
|
||||
|
||||
Moltbot can use **Jina Reader** as a fallback extractor for `web_fetch`. It is a
|
||||
content extraction service optimised for LLM consumption, with excellent PDF support
|
||||
and optional browser rendering for JavaScript-heavy sites.
|
||||
|
||||
## Get an API key
|
||||
|
||||
1) Sign up at https://jina.ai/?sui=apikey (10M free tokens included)
|
||||
2) Store it in config or set `JINA_API_KEY` in the gateway environment.
|
||||
|
||||
## Configure Jina
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
web: {
|
||||
fetch: {
|
||||
jina: {
|
||||
apiKey: "JINA_API_KEY_HERE",
|
||||
baseUrl: "https://r.jina.ai",
|
||||
engine: "browser",
|
||||
timeoutSeconds: 30
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Notes:
|
||||
- `jina.enabled` defaults to true when an API key is present.
|
||||
- `engine` can be "browser" (quality), "direct" (speed), or "cf-browser-rendering" (JS-heavy).
|
||||
|
||||
## Engine Options
|
||||
|
||||
| Engine | Best For | Speed |
|
||||
|--------|----------|-------|
|
||||
| `direct` | Simple HTML pages | Fastest |
|
||||
| `browser` | Most pages (default) | Medium |
|
||||
| `cf-browser-rendering` | JS-heavy SPAs | Slowest |
|
||||
|
||||
## Additional Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `noCache` | Bypass Jina's cache for fresh content |
|
||||
| `withLinksSummary` | Include a summary of all links at end of content |
|
||||
| `withImagesSummary` | Include a summary of all images at end of content |
|
||||
| `returnFormat` | Output format: "markdown", "text", or "html" |
|
||||
|
||||
## How `web_fetch` uses Jina
|
||||
|
||||
`web_fetch` extraction order:
|
||||
1) Readability (local)
|
||||
2) **Jina** (if configured)
|
||||
3) Firecrawl (if configured)
|
||||
4) Basic HTML cleanup (last fallback)
|
||||
|
||||
Jina is tried before Firecrawl because:
|
||||
- Token-based pricing is more affordable for high-volume use
|
||||
- Better PDF extraction support
|
||||
- No anti-bot circumvention overhead
|
||||
|
||||
See [Web tools](/tools/web) for the full web tool setup.
|
||||
|
||||
## Comparison with Firecrawl
|
||||
|
||||
| Feature | Jina | Firecrawl |
|
||||
|---------|------|-----------|
|
||||
| **Pricing** | Token-based (10M free) | Credit-based |
|
||||
| **PDF Support** | Excellent | Basic |
|
||||
| **Image Captioning** | Yes (via VLM) | No |
|
||||
| **Anti-bot Bypass** | No | Yes (stealth proxy) |
|
||||
| **JS Rendering** | Yes (browser engine) | Yes |
|
||||
|
||||
**Recommendation:** Use Jina for most use cases; add Firecrawl if you frequently
|
||||
hit bot detection on specific sites.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
- `JINA_API_KEY` - API key for Jina Reader (fallback when not set in config)
|
||||
|
||||
## Rate Limits
|
||||
|
||||
| Tier | RPM |
|
||||
|------|-----|
|
||||
| No API Key | 20 |
|
||||
| Free API Key | 500 |
|
||||
| Premium | 5,000 |
|
||||
|
||||
## Pricing
|
||||
|
||||
Jina uses token-based pricing:
|
||||
- **Free tier**: 10 million tokens included
|
||||
- **Paid**: ~$0.02 per 1M tokens (varies by plan)
|
||||
|
||||
This is typically more cost-effective than Firecrawl's credit-based model for
|
||||
high-volume extraction.
|
||||
@ -209,6 +209,7 @@ Fetch a URL and extract readable content.
|
||||
### Requirements
|
||||
|
||||
- `tools.web.fetch.enabled` must not be `false` (default: enabled)
|
||||
- Optional Jina fallback: set `tools.web.fetch.jina.apiKey` or `JINA_API_KEY`.
|
||||
- Optional Firecrawl fallback: set `tools.web.fetch.firecrawl.apiKey` or `FIRECRAWL_API_KEY`.
|
||||
|
||||
### Config
|
||||
@ -225,6 +226,13 @@ Fetch a URL and extract readable content.
|
||||
maxRedirects: 3,
|
||||
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
|
||||
readability: true,
|
||||
jina: {
|
||||
enabled: true,
|
||||
apiKey: "JINA_API_KEY_HERE", // optional if JINA_API_KEY is set
|
||||
baseUrl: "https://r.jina.ai",
|
||||
engine: "browser", // "browser", "direct", or "cf-browser-rendering"
|
||||
timeoutSeconds: 30
|
||||
},
|
||||
firecrawl: {
|
||||
enabled: true,
|
||||
apiKey: "FIRECRAWL_API_KEY_HERE", // optional if FIRECRAWL_API_KEY is set
|
||||
@ -246,12 +254,14 @@ Fetch a URL and extract readable content.
|
||||
- `maxChars` (truncate long pages)
|
||||
|
||||
Notes:
|
||||
- `web_fetch` uses Readability (main-content extraction) first, then Firecrawl (if configured). If both fail, the tool returns an error.
|
||||
- `web_fetch` uses Readability (main-content extraction) first, then Jina (if configured), then Firecrawl (if configured). If all fail, the tool returns an error.
|
||||
- Jina is tried before Firecrawl because it has better PDF support and more affordable token-based pricing.
|
||||
- Firecrawl requests use bot-circumvention mode and cache results by default.
|
||||
- `web_fetch` sends a Chrome-like User-Agent and `Accept-Language` by default; override `userAgent` if needed.
|
||||
- `web_fetch` blocks private/internal hostnames and re-checks redirects (limit with `maxRedirects`).
|
||||
- `web_fetch` is best-effort extraction; some sites will need the browser tool.
|
||||
- See [Firecrawl](/tools/firecrawl) for key setup and service details.
|
||||
- See [Jina](/tools/jina) for Jina Reader setup and service details.
|
||||
- See [Firecrawl](/tools/firecrawl) for Firecrawl key setup and service details.
|
||||
- Responses are cached (default 15 minutes) to reduce repeated fetches.
|
||||
- If you use tool profiles/allowlists, add `web_search`/`web_fetch` or `group:web`.
|
||||
- If the Brave key is missing, `web_search` returns a short setup hint with a docs link.
|
||||
|
||||
@ -40,6 +40,7 @@ const DEFAULT_FETCH_MAX_REDIRECTS = 3;
|
||||
const DEFAULT_ERROR_MAX_CHARS = 4_000;
|
||||
const DEFAULT_FIRECRAWL_BASE_URL = "https://api.firecrawl.dev";
|
||||
const DEFAULT_FIRECRAWL_MAX_AGE_MS = 172_800_000;
|
||||
const DEFAULT_JINA_BASE_URL = "https://r.jina.ai";
|
||||
const DEFAULT_FETCH_USER_AGENT =
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36";
|
||||
|
||||
@ -78,6 +79,20 @@ type FirecrawlFetchConfig =
|
||||
}
|
||||
| undefined;
|
||||
|
||||
type JinaFetchConfig =
|
||||
| {
|
||||
enabled?: boolean;
|
||||
apiKey?: string;
|
||||
baseUrl?: string;
|
||||
engine?: "browser" | "direct" | "cf-browser-rendering";
|
||||
returnFormat?: "markdown" | "text" | "html";
|
||||
timeoutSeconds?: number;
|
||||
noCache?: boolean;
|
||||
withLinksSummary?: boolean;
|
||||
withImagesSummary?: boolean;
|
||||
}
|
||||
| undefined;
|
||||
|
||||
function resolveFetchConfig(cfg?: MoltbotConfig): WebFetchConfig {
|
||||
const fetch = cfg?.tools?.web?.fetch;
|
||||
if (!fetch || typeof fetch !== "object") return undefined;
|
||||
@ -147,6 +162,33 @@ function resolveFirecrawlMaxAgeMsOrDefault(firecrawl?: FirecrawlFetchConfig): nu
|
||||
return DEFAULT_FIRECRAWL_MAX_AGE_MS;
|
||||
}
|
||||
|
||||
// ===== Jina Configuration Resolvers =====
|
||||
|
||||
function resolveJinaConfig(fetch?: WebFetchConfig): JinaFetchConfig {
|
||||
if (!fetch || typeof fetch !== "object") return undefined;
|
||||
const jina = "jina" in fetch ? fetch.jina : undefined;
|
||||
if (!jina || typeof jina !== "object") return undefined;
|
||||
return jina as JinaFetchConfig;
|
||||
}
|
||||
|
||||
function resolveJinaApiKey(jina?: JinaFetchConfig): string | undefined {
|
||||
const fromConfig =
|
||||
jina && "apiKey" in jina && typeof jina.apiKey === "string" ? jina.apiKey.trim() : "";
|
||||
const fromEnv = (process.env.JINA_API_KEY ?? "").trim();
|
||||
return fromConfig || fromEnv || undefined;
|
||||
}
|
||||
|
||||
function resolveJinaEnabled(params: { jina?: JinaFetchConfig; apiKey?: string }): boolean {
|
||||
if (typeof params.jina?.enabled === "boolean") return params.jina.enabled;
|
||||
return Boolean(params.apiKey);
|
||||
}
|
||||
|
||||
function resolveJinaBaseUrl(jina?: JinaFetchConfig): string {
|
||||
const raw =
|
||||
jina && "baseUrl" in jina && typeof jina.baseUrl === "string" ? jina.baseUrl.trim() : "";
|
||||
return raw || DEFAULT_JINA_BASE_URL;
|
||||
}
|
||||
|
||||
function resolveMaxChars(value: unknown, fallback: number): number {
|
||||
const parsed = typeof value === "number" && Number.isFinite(value) ? value : fallback;
|
||||
return Math.max(100, Math.floor(parsed));
|
||||
@ -329,6 +371,83 @@ export async function fetchFirecrawlContent(params: {
|
||||
};
|
||||
}
|
||||
|
||||
export async function fetchJinaContent(params: {
|
||||
url: string;
|
||||
extractMode: ExtractMode;
|
||||
apiKey: string;
|
||||
baseUrl: string;
|
||||
engine?: "browser" | "direct" | "cf-browser-rendering";
|
||||
noCache?: boolean;
|
||||
withLinksSummary?: boolean;
|
||||
withImagesSummary?: boolean;
|
||||
timeoutSeconds: number;
|
||||
}): Promise<{
|
||||
text: string;
|
||||
title?: string;
|
||||
finalUrl?: string;
|
||||
status?: number;
|
||||
}> {
|
||||
const headers: Record<string, string> = {
|
||||
Authorization: `Bearer ${params.apiKey}`,
|
||||
Accept: "application/json",
|
||||
"Content-Type": "application/json",
|
||||
};
|
||||
|
||||
// Optional Jina headers
|
||||
if (params.engine) {
|
||||
headers["X-Engine"] = params.engine;
|
||||
}
|
||||
if (params.noCache) {
|
||||
headers["X-No-Cache"] = "true";
|
||||
}
|
||||
if (params.withLinksSummary) {
|
||||
headers["X-With-Links-Summary"] = "true";
|
||||
}
|
||||
if (params.withImagesSummary) {
|
||||
headers["X-With-Images-Summary"] = "true";
|
||||
}
|
||||
if (params.timeoutSeconds) {
|
||||
headers["X-Timeout"] = String(params.timeoutSeconds);
|
||||
}
|
||||
|
||||
// Determine return format based on extractMode
|
||||
const returnFormat = params.extractMode === "text" ? "text" : "markdown";
|
||||
headers["X-Return-Format"] = returnFormat;
|
||||
|
||||
const res = await fetch(params.baseUrl, {
|
||||
method: "POST",
|
||||
headers,
|
||||
body: JSON.stringify({ url: params.url }),
|
||||
signal: withTimeout(undefined, params.timeoutSeconds * 1000),
|
||||
});
|
||||
|
||||
const payload = (await res.json()) as {
|
||||
code?: number;
|
||||
status?: number;
|
||||
data?: {
|
||||
title?: string;
|
||||
content?: string;
|
||||
url?: string;
|
||||
};
|
||||
error?: string;
|
||||
};
|
||||
|
||||
if (!res.ok || (payload?.code && payload.code !== 200)) {
|
||||
const detail = payload?.error || res.statusText;
|
||||
throw new Error(`Jina fetch failed (${res.status}): ${detail}`.trim());
|
||||
}
|
||||
|
||||
const data = payload?.data ?? {};
|
||||
const text = typeof data.content === "string" ? data.content : "";
|
||||
|
||||
return {
|
||||
text,
|
||||
title: data.title,
|
||||
finalUrl: data.url,
|
||||
status: payload.code ?? res.status,
|
||||
};
|
||||
}
|
||||
|
||||
async function runWebFetch(params: {
|
||||
url: string;
|
||||
extractMode: ExtractMode;
|
||||
@ -338,6 +457,14 @@ async function runWebFetch(params: {
|
||||
cacheTtlMs: number;
|
||||
userAgent: string;
|
||||
readabilityEnabled: boolean;
|
||||
jinaEnabled: boolean;
|
||||
jinaApiKey?: string;
|
||||
jinaBaseUrl: string;
|
||||
jinaEngine?: "browser" | "direct" | "cf-browser-rendering";
|
||||
jinaNoCache?: boolean;
|
||||
jinaWithLinksSummary?: boolean;
|
||||
jinaWithImagesSummary?: boolean;
|
||||
jinaTimeoutSeconds: number;
|
||||
firecrawlEnabled: boolean;
|
||||
firecrawlApiKey?: string;
|
||||
firecrawlBaseUrl: string;
|
||||
@ -381,6 +508,42 @@ async function runWebFetch(params: {
|
||||
if (error instanceof SsrFBlockedError) {
|
||||
throw error;
|
||||
}
|
||||
// Try Jina first (cheaper, better PDF support)
|
||||
if (params.jinaEnabled && params.jinaApiKey) {
|
||||
try {
|
||||
const jina = await fetchJinaContent({
|
||||
url: finalUrl,
|
||||
extractMode: params.extractMode,
|
||||
apiKey: params.jinaApiKey,
|
||||
baseUrl: params.jinaBaseUrl,
|
||||
engine: params.jinaEngine,
|
||||
noCache: params.jinaNoCache,
|
||||
withLinksSummary: params.jinaWithLinksSummary,
|
||||
withImagesSummary: params.jinaWithImagesSummary,
|
||||
timeoutSeconds: params.jinaTimeoutSeconds,
|
||||
});
|
||||
const truncated = truncateText(jina.text, params.maxChars);
|
||||
const payload = {
|
||||
url: params.url,
|
||||
finalUrl: jina.finalUrl || finalUrl,
|
||||
status: jina.status ?? 200,
|
||||
contentType: "text/markdown",
|
||||
title: jina.title,
|
||||
extractMode: params.extractMode,
|
||||
extractor: "jina",
|
||||
truncated: truncated.truncated,
|
||||
length: truncated.text.length,
|
||||
fetchedAt: new Date().toISOString(),
|
||||
tookMs: Date.now() - start,
|
||||
text: truncated.text,
|
||||
};
|
||||
writeCache(FETCH_CACHE, cacheKey, payload, params.cacheTtlMs);
|
||||
return payload;
|
||||
} catch {
|
||||
// Fall through to Firecrawl
|
||||
}
|
||||
}
|
||||
// Then try Firecrawl (bot circumvention)
|
||||
if (params.firecrawlEnabled && params.firecrawlApiKey) {
|
||||
const firecrawl = await fetchFirecrawlContent({
|
||||
url: finalUrl,
|
||||
@ -417,6 +580,42 @@ async function runWebFetch(params: {
|
||||
|
||||
try {
|
||||
if (!res.ok) {
|
||||
// Try Jina first (cheaper, better PDF support)
|
||||
if (params.jinaEnabled && params.jinaApiKey) {
|
||||
try {
|
||||
const jina = await fetchJinaContent({
|
||||
url: params.url,
|
||||
extractMode: params.extractMode,
|
||||
apiKey: params.jinaApiKey,
|
||||
baseUrl: params.jinaBaseUrl,
|
||||
engine: params.jinaEngine,
|
||||
noCache: params.jinaNoCache,
|
||||
withLinksSummary: params.jinaWithLinksSummary,
|
||||
withImagesSummary: params.jinaWithImagesSummary,
|
||||
timeoutSeconds: params.jinaTimeoutSeconds,
|
||||
});
|
||||
const truncated = truncateText(jina.text, params.maxChars);
|
||||
const payload = {
|
||||
url: params.url,
|
||||
finalUrl: jina.finalUrl || finalUrl,
|
||||
status: jina.status ?? res.status,
|
||||
contentType: "text/markdown",
|
||||
title: jina.title,
|
||||
extractMode: params.extractMode,
|
||||
extractor: "jina",
|
||||
truncated: truncated.truncated,
|
||||
length: truncated.text.length,
|
||||
fetchedAt: new Date().toISOString(),
|
||||
tookMs: Date.now() - start,
|
||||
text: truncated.text,
|
||||
};
|
||||
writeCache(FETCH_CACHE, cacheKey, payload, params.cacheTtlMs);
|
||||
return payload;
|
||||
} catch {
|
||||
// Fall through to Firecrawl
|
||||
}
|
||||
}
|
||||
// Then try Firecrawl (bot circumvention)
|
||||
if (params.firecrawlEnabled && params.firecrawlApiKey) {
|
||||
const firecrawl = await fetchFirecrawlContent({
|
||||
url: params.url,
|
||||
@ -475,20 +674,29 @@ async function runWebFetch(params: {
|
||||
title = readable.title;
|
||||
extractor = "readability";
|
||||
} else {
|
||||
const firecrawl = await tryFirecrawlFallback({ ...params, url: finalUrl });
|
||||
if (firecrawl) {
|
||||
text = firecrawl.text;
|
||||
title = firecrawl.title;
|
||||
extractor = "firecrawl";
|
||||
// Try Jina first (cheaper, better PDF support)
|
||||
const jina = await tryJinaFallback({ ...params, url: finalUrl });
|
||||
if (jina) {
|
||||
text = jina.text;
|
||||
title = jina.title;
|
||||
extractor = "jina";
|
||||
} else {
|
||||
throw new Error(
|
||||
"Web fetch extraction failed: Readability and Firecrawl returned no content.",
|
||||
);
|
||||
// Then try Firecrawl (bot circumvention)
|
||||
const firecrawl = await tryFirecrawlFallback({ ...params, url: finalUrl });
|
||||
if (firecrawl) {
|
||||
text = firecrawl.text;
|
||||
title = firecrawl.title;
|
||||
extractor = "firecrawl";
|
||||
} else {
|
||||
throw new Error(
|
||||
"Web fetch extraction failed: Readability, Jina, and Firecrawl returned no content.",
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
throw new Error(
|
||||
"Web fetch extraction failed: Readability disabled and Firecrawl unavailable.",
|
||||
"Web fetch extraction failed: Readability disabled and Jina/Firecrawl unavailable.",
|
||||
);
|
||||
}
|
||||
} else if (contentType.includes("application/json")) {
|
||||
@ -554,6 +762,37 @@ async function tryFirecrawlFallback(params: {
|
||||
}
|
||||
}
|
||||
|
||||
async function tryJinaFallback(params: {
|
||||
url: string;
|
||||
extractMode: ExtractMode;
|
||||
jinaEnabled: boolean;
|
||||
jinaApiKey?: string;
|
||||
jinaBaseUrl: string;
|
||||
jinaEngine?: "browser" | "direct" | "cf-browser-rendering";
|
||||
jinaNoCache?: boolean;
|
||||
jinaWithLinksSummary?: boolean;
|
||||
jinaWithImagesSummary?: boolean;
|
||||
jinaTimeoutSeconds: number;
|
||||
}): Promise<{ text: string; title?: string } | null> {
|
||||
if (!params.jinaEnabled || !params.jinaApiKey) return null;
|
||||
try {
|
||||
const jina = await fetchJinaContent({
|
||||
url: params.url,
|
||||
extractMode: params.extractMode,
|
||||
apiKey: params.jinaApiKey,
|
||||
baseUrl: params.jinaBaseUrl,
|
||||
engine: params.jinaEngine,
|
||||
noCache: params.jinaNoCache,
|
||||
withLinksSummary: params.jinaWithLinksSummary,
|
||||
withImagesSummary: params.jinaWithImagesSummary,
|
||||
timeoutSeconds: params.jinaTimeoutSeconds,
|
||||
});
|
||||
return { text: jina.text, title: jina.title };
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function resolveFirecrawlEndpoint(baseUrl: string): string {
|
||||
const trimmed = baseUrl.trim();
|
||||
if (!trimmed) return `${DEFAULT_FIRECRAWL_BASE_URL}/v2/scrape`;
|
||||
@ -576,6 +815,22 @@ export function createWebFetchTool(options?: {
|
||||
const fetch = resolveFetchConfig(options?.config);
|
||||
if (!resolveFetchEnabled({ fetch, sandboxed: options?.sandboxed })) return null;
|
||||
const readabilityEnabled = resolveFetchReadabilityEnabled(fetch);
|
||||
|
||||
// Jina config
|
||||
const jina = resolveJinaConfig(fetch);
|
||||
const jinaApiKey = resolveJinaApiKey(jina);
|
||||
const jinaEnabled = resolveJinaEnabled({ jina, apiKey: jinaApiKey });
|
||||
const jinaBaseUrl = resolveJinaBaseUrl(jina);
|
||||
const jinaEngine = jina?.engine;
|
||||
const jinaNoCache = jina?.noCache;
|
||||
const jinaWithLinksSummary = jina?.withLinksSummary;
|
||||
const jinaWithImagesSummary = jina?.withImagesSummary;
|
||||
const jinaTimeoutSeconds = resolveTimeoutSeconds(
|
||||
jina?.timeoutSeconds ?? fetch?.timeoutSeconds,
|
||||
DEFAULT_TIMEOUT_SECONDS,
|
||||
);
|
||||
|
||||
// Firecrawl config
|
||||
const firecrawl = resolveFirecrawlConfig(fetch);
|
||||
const firecrawlApiKey = resolveFirecrawlApiKey(firecrawl);
|
||||
const firecrawlEnabled = resolveFirecrawlEnabled({ firecrawl, apiKey: firecrawlApiKey });
|
||||
@ -586,6 +841,7 @@ export function createWebFetchTool(options?: {
|
||||
firecrawl?.timeoutSeconds ?? fetch?.timeoutSeconds,
|
||||
DEFAULT_TIMEOUT_SECONDS,
|
||||
);
|
||||
|
||||
const userAgent =
|
||||
(fetch && "userAgent" in fetch && typeof fetch.userAgent === "string" && fetch.userAgent) ||
|
||||
DEFAULT_FETCH_USER_AGENT;
|
||||
@ -609,6 +865,14 @@ export function createWebFetchTool(options?: {
|
||||
cacheTtlMs: resolveCacheTtlMs(fetch?.cacheTtlMinutes, DEFAULT_CACHE_TTL_MINUTES),
|
||||
userAgent,
|
||||
readabilityEnabled,
|
||||
jinaEnabled,
|
||||
jinaApiKey,
|
||||
jinaBaseUrl,
|
||||
jinaEngine,
|
||||
jinaNoCache,
|
||||
jinaWithLinksSummary,
|
||||
jinaWithImagesSummary,
|
||||
jinaTimeoutSeconds,
|
||||
firecrawlEnabled,
|
||||
firecrawlApiKey,
|
||||
firecrawlBaseUrl,
|
||||
|
||||
@ -199,6 +199,15 @@ const FIELD_LABELS: Record<string, string> = {
|
||||
"tools.web.fetch.cacheTtlMinutes": "Web Fetch Cache TTL (min)",
|
||||
"tools.web.fetch.maxRedirects": "Web Fetch Max Redirects",
|
||||
"tools.web.fetch.userAgent": "Web Fetch User-Agent",
|
||||
"tools.web.fetch.jina.enabled": "Enable Jina Reader",
|
||||
"tools.web.fetch.jina.apiKey": "Jina API Key",
|
||||
"tools.web.fetch.jina.baseUrl": "Jina Base URL",
|
||||
"tools.web.fetch.jina.engine": "Jina Engine",
|
||||
"tools.web.fetch.jina.returnFormat": "Jina Return Format",
|
||||
"tools.web.fetch.jina.timeoutSeconds": "Jina Timeout (sec)",
|
||||
"tools.web.fetch.jina.noCache": "Jina Disable Cache",
|
||||
"tools.web.fetch.jina.withLinksSummary": "Jina Include Links Summary",
|
||||
"tools.web.fetch.jina.withImagesSummary": "Jina Include Images Summary",
|
||||
"gateway.controlUi.basePath": "Control UI Base Path",
|
||||
"gateway.controlUi.allowInsecureAuth": "Allow Insecure Control UI Auth",
|
||||
"gateway.controlUi.dangerouslyDisableDeviceAuth": "Dangerously Disable Control UI Device Auth",
|
||||
@ -463,6 +472,17 @@ const FIELD_HELP: Record<string, string> = {
|
||||
"tools.web.fetch.firecrawl.maxAgeMs":
|
||||
"Firecrawl maxAge (ms) for cached results when supported by the API.",
|
||||
"tools.web.fetch.firecrawl.timeoutSeconds": "Timeout in seconds for Firecrawl requests.",
|
||||
"tools.web.fetch.jina.enabled": "Enable Jina Reader fallback for web_fetch (if configured).",
|
||||
"tools.web.fetch.jina.apiKey": "Jina API key (fallback: JINA_API_KEY env var).",
|
||||
"tools.web.fetch.jina.baseUrl": "Jina base URL (default: https://r.jina.ai).",
|
||||
"tools.web.fetch.jina.engine":
|
||||
'Jina engine ("browser" for quality, "direct" for speed, "cf-browser-rendering" for JS-heavy sites).',
|
||||
"tools.web.fetch.jina.returnFormat": 'Jina return format ("markdown", "text", or "html").',
|
||||
"tools.web.fetch.jina.timeoutSeconds": "Timeout in seconds for Jina requests.",
|
||||
"tools.web.fetch.jina.noCache": "Bypass Jina cache for fresh content (default: false).",
|
||||
"tools.web.fetch.jina.withLinksSummary": "Include a summary of all links at the end of content.",
|
||||
"tools.web.fetch.jina.withImagesSummary":
|
||||
"Include a summary of all images at the end of content.",
|
||||
"channels.slack.allowBots":
|
||||
"Allow bot-authored messages to trigger Slack replies (default: false).",
|
||||
"channels.slack.thread.historyScope":
|
||||
|
||||
@ -182,6 +182,33 @@ export const ToolsWebSearchSchema = z
|
||||
.strict()
|
||||
.optional();
|
||||
|
||||
export const ToolsWebFetchFirecrawlSchema = z
|
||||
.object({
|
||||
enabled: z.boolean().optional(),
|
||||
apiKey: z.string().optional(),
|
||||
baseUrl: z.string().optional(),
|
||||
onlyMainContent: z.boolean().optional(),
|
||||
maxAgeMs: z.number().int().nonnegative().optional(),
|
||||
timeoutSeconds: z.number().int().positive().optional(),
|
||||
})
|
||||
.strict()
|
||||
.optional();
|
||||
|
||||
export const ToolsWebFetchJinaSchema = z
|
||||
.object({
|
||||
enabled: z.boolean().optional(),
|
||||
apiKey: z.string().optional(),
|
||||
baseUrl: z.string().optional(),
|
||||
engine: z.enum(["browser", "direct", "cf-browser-rendering"]).optional(),
|
||||
returnFormat: z.enum(["markdown", "text", "html"]).optional(),
|
||||
timeoutSeconds: z.number().int().positive().optional(),
|
||||
noCache: z.boolean().optional(),
|
||||
withLinksSummary: z.boolean().optional(),
|
||||
withImagesSummary: z.boolean().optional(),
|
||||
})
|
||||
.strict()
|
||||
.optional();
|
||||
|
||||
export const ToolsWebFetchSchema = z
|
||||
.object({
|
||||
enabled: z.boolean().optional(),
|
||||
@ -190,6 +217,9 @@ export const ToolsWebFetchSchema = z
|
||||
cacheTtlMinutes: z.number().nonnegative().optional(),
|
||||
maxRedirects: z.number().int().nonnegative().optional(),
|
||||
userAgent: z.string().optional(),
|
||||
readability: z.boolean().optional(),
|
||||
firecrawl: ToolsWebFetchFirecrawlSchema,
|
||||
jina: ToolsWebFetchJinaSchema,
|
||||
})
|
||||
.strict()
|
||||
.optional();
|
||||
|
||||
Loading…
Reference in New Issue
Block a user