openclaw/docs/tools/jina.md
Nathan Schram 2f22e1a88b feat(tools): add Jina Reader as web_fetch fallback provider
Add Jina Reader (https://jina.ai/reader/) as a native fallback provider
for web_fetch, similar to how Firecrawl is integrated.

Jina provides high-quality content extraction with:
- PDF support (native text extraction)
- Image captioning (via vision language models)
- JavaScript rendering (browser engine option)
- Token-based pricing (10M free tokens, more affordable than Firecrawl)
- Markdown-optimised output for LLM consumption

Changes:
- Add ToolsWebFetchJinaSchema to zod-schema.agent-runtime.ts
- Add fetchJinaContent() and tryJinaFallback() to web-fetch.ts
- Update fallback chain: Readability -> Jina -> Firecrawl -> error
- Add UI hints for Jina config options in schema.ts
- Add docs/tools/jina.md documentation
- Update docs/tools/web.md to reference Jina

Configuration example:
  tools.web.fetch.jina.apiKey or JINA_API_KEY env var

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:54:15 +11:00

2.9 KiB

summary read_when
Jina Reader fallback for web_fetch (PDF support + browser rendering)
You want Jina-backed web extraction
You need a Jina API key
You want PDF extraction for web_fetch
You want browser-rendered extraction for JS-heavy sites

Jina Reader

Moltbot can use Jina Reader as a fallback extractor for web_fetch. It is a content extraction service optimised for LLM consumption, with excellent PDF support and optional browser rendering for JavaScript-heavy sites.

Get an API key

  1. Sign up at https://jina.ai/?sui=apikey (10M free tokens included)
  2. Store it in config or set JINA_API_KEY in the gateway environment.

Configure Jina

{
  tools: {
    web: {
      fetch: {
        jina: {
          apiKey: "JINA_API_KEY_HERE",
          baseUrl: "https://r.jina.ai",
          engine: "browser",
          timeoutSeconds: 30
        }
      }
    }
  }
}

Notes:

  • jina.enabled defaults to true when an API key is present.
  • engine can be "browser" (quality), "direct" (speed), or "cf-browser-rendering" (JS-heavy).

Engine Options

Engine Best For Speed
direct Simple HTML pages Fastest
browser Most pages (default) Medium
cf-browser-rendering JS-heavy SPAs Slowest

Additional Options

Option Description
noCache Bypass Jina's cache for fresh content
withLinksSummary Include a summary of all links at end of content
withImagesSummary Include a summary of all images at end of content
returnFormat Output format: "markdown", "text", or "html"

How web_fetch uses Jina

web_fetch extraction order:

  1. Readability (local)
  2. Jina (if configured)
  3. Firecrawl (if configured)
  4. Basic HTML cleanup (last fallback)

Jina is tried before Firecrawl because:

  • Token-based pricing is more affordable for high-volume use
  • Better PDF extraction support
  • No anti-bot circumvention overhead

See Web tools for the full web tool setup.

Comparison with Firecrawl

Feature Jina Firecrawl
Pricing Token-based (10M free) Credit-based
PDF Support Excellent Basic
Image Captioning Yes (via VLM) No
Anti-bot Bypass No Yes (stealth proxy)
JS Rendering Yes (browser engine) Yes

Recommendation: Use Jina for most use cases; add Firecrawl if you frequently hit bot detection on specific sites.

Environment Variables

  • JINA_API_KEY - API key for Jina Reader (fallback when not set in config)

Rate Limits

Tier RPM
No API Key 20
Free API Key 500
Premium 5,000

Pricing

Jina uses token-based pricing:

  • Free tier: 10 million tokens included
  • Paid: ~$0.02 per 1M tokens (varies by plan)

This is typically more cost-effective than Firecrawl's credit-based model for high-volume extraction.