Add Jina Reader (https://jina.ai/reader/) as a native fallback provider for web_fetch, similar to how Firecrawl is integrated. Jina provides high-quality content extraction with: - PDF support (native text extraction) - Image captioning (via vision language models) - JavaScript rendering (browser engine option) - Token-based pricing (10M free tokens, more affordable than Firecrawl) - Markdown-optimised output for LLM consumption Changes: - Add ToolsWebFetchJinaSchema to zod-schema.agent-runtime.ts - Add fetchJinaContent() and tryJinaFallback() to web-fetch.ts - Update fallback chain: Readability -> Jina -> Firecrawl -> error - Add UI hints for Jina config options in schema.ts - Add docs/tools/jina.md documentation - Update docs/tools/web.md to reference Jina Configuration example: tools.web.fetch.jina.apiKey or JINA_API_KEY env var Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2.9 KiB
2.9 KiB
| summary | read_when | ||||
|---|---|---|---|---|---|
| Jina Reader fallback for web_fetch (PDF support + browser rendering) |
|
Jina Reader
Moltbot can use Jina Reader as a fallback extractor for web_fetch. It is a
content extraction service optimised for LLM consumption, with excellent PDF support
and optional browser rendering for JavaScript-heavy sites.
Get an API key
- Sign up at https://jina.ai/?sui=apikey (10M free tokens included)
- Store it in config or set
JINA_API_KEYin the gateway environment.
Configure Jina
{
tools: {
web: {
fetch: {
jina: {
apiKey: "JINA_API_KEY_HERE",
baseUrl: "https://r.jina.ai",
engine: "browser",
timeoutSeconds: 30
}
}
}
}
}
Notes:
jina.enableddefaults to true when an API key is present.enginecan be "browser" (quality), "direct" (speed), or "cf-browser-rendering" (JS-heavy).
Engine Options
| Engine | Best For | Speed |
|---|---|---|
direct |
Simple HTML pages | Fastest |
browser |
Most pages (default) | Medium |
cf-browser-rendering |
JS-heavy SPAs | Slowest |
Additional Options
| Option | Description |
|---|---|
noCache |
Bypass Jina's cache for fresh content |
withLinksSummary |
Include a summary of all links at end of content |
withImagesSummary |
Include a summary of all images at end of content |
returnFormat |
Output format: "markdown", "text", or "html" |
How web_fetch uses Jina
web_fetch extraction order:
- Readability (local)
- Jina (if configured)
- Firecrawl (if configured)
- Basic HTML cleanup (last fallback)
Jina is tried before Firecrawl because:
- Token-based pricing is more affordable for high-volume use
- Better PDF extraction support
- No anti-bot circumvention overhead
See Web tools for the full web tool setup.
Comparison with Firecrawl
| Feature | Jina | Firecrawl |
|---|---|---|
| Pricing | Token-based (10M free) | Credit-based |
| PDF Support | Excellent | Basic |
| Image Captioning | Yes (via VLM) | No |
| Anti-bot Bypass | No | Yes (stealth proxy) |
| JS Rendering | Yes (browser engine) | Yes |
Recommendation: Use Jina for most use cases; add Firecrawl if you frequently hit bot detection on specific sites.
Environment Variables
JINA_API_KEY- API key for Jina Reader (fallback when not set in config)
Rate Limits
| Tier | RPM |
|---|---|
| No API Key | 20 |
| Free API Key | 500 |
| Premium | 5,000 |
Pricing
Jina uses token-based pricing:
- Free tier: 10 million tokens included
- Paid: ~$0.02 per 1M tokens (varies by plan)
This is typically more cost-effective than Firecrawl's credit-based model for high-volume extraction.