Nathan Schram 2f22e1a88b feat(tools): add Jina Reader as web_fetch fallback provider

Add Jina Reader (https://jina.ai/reader/) as a native fallback provider
for web_fetch, similar to how Firecrawl is integrated.

Jina provides high-quality content extraction with:
- PDF support (native text extraction)
- Image captioning (via vision language models)
- JavaScript rendering (browser engine option)
- Token-based pricing (10M free tokens, more affordable than Firecrawl)
- Markdown-optimised output for LLM consumption

Changes:
- Add ToolsWebFetchJinaSchema to zod-schema.agent-runtime.ts
- Add fetchJinaContent() and tryJinaFallback() to web-fetch.ts
- Update fallback chain: Readability -> Jina -> Firecrawl -> error
- Add UI hints for Jina config options in schema.ts
- Add docs/tools/jina.md documentation
- Update docs/tools/web.md to reference Jina

Configuration example:
  tools.web.fetch.jina.apiKey or JINA_API_KEY env var

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-29 14:54:15 +11:00

2.9 KiB

Raw Blame History

summary

read_when

Jina Reader fallback for web_fetch (PDF support + browser rendering)

You want Jina-backed web extraction

You need a Jina API key

You want PDF extraction for web_fetch

You want browser-rendered extraction for JS-heavy sites

Jina Reader

Moltbot can use Jina Reader as a fallback extractor for web_fetch. It is a content extraction service optimised for LLM consumption, with excellent PDF support and optional browser rendering for JavaScript-heavy sites.

Get an API key

Sign up at https://jina.ai/?sui=apikey (10M free tokens included)
Store it in config or set JINA_API_KEY in the gateway environment.

Configure Jina

{
  tools: {
    web: {
      fetch: {
        jina: {
          apiKey: "JINA_API_KEY_HERE",
          baseUrl: "https://r.jina.ai",
          engine: "browser",
          timeoutSeconds: 30
        }
      }
    }
  }
}

Notes:

jina.enabled defaults to true when an API key is present.
engine can be "browser" (quality), "direct" (speed), or "cf-browser-rendering" (JS-heavy).

Engine Options

Engine	Best For	Speed
`direct`	Simple HTML pages	Fastest
`browser`	Most pages (default)	Medium
`cf-browser-rendering`	JS-heavy SPAs	Slowest

Additional Options

Option	Description
`noCache`	Bypass Jina's cache for fresh content
`withLinksSummary`	Include a summary of all links at end of content
`withImagesSummary`	Include a summary of all images at end of content
`returnFormat`	Output format: "markdown", "text", or "html"

How `web_fetch` uses Jina

web_fetch extraction order:

Readability (local)
Jina (if configured)
Firecrawl (if configured)
Basic HTML cleanup (last fallback)

Jina is tried before Firecrawl because:

Token-based pricing is more affordable for high-volume use
Better PDF extraction support
No anti-bot circumvention overhead

See Web tools for the full web tool setup.

Comparison with Firecrawl

Feature	Jina	Firecrawl
Pricing	Token-based (10M free)	Credit-based
PDF Support	Excellent	Basic
Image Captioning	Yes (via VLM)	No
Anti-bot Bypass	No	Yes (stealth proxy)
JS Rendering	Yes (browser engine)	Yes

Recommendation: Use Jina for most use cases; add Firecrawl if you frequently hit bot detection on specific sites.

Environment Variables

JINA_API_KEY - API key for Jina Reader (fallback when not set in config)

Rate Limits

Tier	RPM
No API Key	20
Free API Key	500
Premium	5,000

Pricing

Jina uses token-based pricing:

Free tier: 10 million tokens included
Paid: ~$0.02 per 1M tokens (varies by plan)

This is typically more cost-effective than Firecrawl's credit-based model for high-volume extraction.

2.9 KiB Raw Blame History