Jina Reader
Prepend r.jina.ai/ to any URL and get LLM-ready markdown — JS rendering, PDFs and Office docs, image captioning, and s.jina.ai for read-the-results search.
Jina Reader is the zero-integration web-content tool: prefix any URL with https://r.jina.ai/ and get clean, LLM-ready markdown — headless-Chrome rendering, PDFs and Office files, images auto-captioned for text-only models. s.jina.ai searches and returns the full content of the top results. Apache-2.0 open-source branch, generous keyed free tier.
Jina Reader won its niche with the lowest possible integration cost: it's a URL prefix. No SDK, no schema, no session — r.jina.ai/ in front of any link returns the page as clean markdown, which makes it the tool agents and scripts reach for when "just read this page" is the whole requirement.
Highlights
- URL-prefix simplicity —
https://r.jina.ai/<url>from curl, a browser, or any HTTP client; the API is the URL bar. - Real-web handling — headless Chrome for JS-heavy pages with an auto-selected curl fast path; PDFs via PDF.js and Word/Excel/PowerPoint via LibreOffice, including direct binary upload.
- Vision built in — images are auto-captioned into alt text, so text-only models don't lose the figures.
- Output control via headers — markdown/html/text/screenshot, CSS-scoped extraction (
x-target-selector), engine and cache controls. s.jina.ai— query in, full content of the top five results out: search and read collapsed into one call.- Open core — the Apache-2.0 repo is the working stateless engine behind the endpoints (SaaS storage layer excluded), self-hostable via Docker.
In an AI-assisted workflow
curl "https://r.jina.ai/https://example.com/docs/page" # page → markdown
curl -H "Authorization: Bearer $JINA_KEY" "https://s.jina.ai/your+query" # search → full contentsIts agent role is the lightweight fetcher: the "read this URL" tool in a research loop, the one-off ingester feeding RAG — anywhere Firecrawl's crawl-scale machinery would be overkill.
TIP
The free tier's shape rewards a key even for hobby use: keyless is ~20 RPM with slower service, while a free key is 500 RPM plus a ten-million-token grant — and unlocks search.
Good to know
Jina AI was acquired by Elastic (completed October 2025; founder Han Xiao became Elastic's VP of AI) with products continuing — Reader's repo stayed active through 2026. Token-based billing scales with output length, so giant pages cost more. Versus the field: Firecrawl for crawl/extract at scale, Tavily for the all-in-one agent layer, Exa for semantic search — mapped in Getting Web Data into AI Agents.
Frequently asked questions
- How do I use Jina Reader?
- Prepend the prefix — curl https://r.jina.ai/https://example.com/article returns the page as markdown. No SDK, no account for light use (keyless is throttled to ~20 requests/minute); a free API key raises Reader to 500 RPM, unlocks s.jina.ai search, and comes with a large token grant. Output and engine are tunable via headers (x-respond-with, x-engine, x-target-selector).
- What's s.jina.ai?
- Search that reads: it runs your query, then fetches and fully extracts the top five results through the same Reader stack — so instead of titles and snippets, your agent gets the actual content of the best pages in one call. It requires an API key.
- Is Jina Reader still maintained after the Elastic acquisition?
- Yes — Elastic completed its acquisition of Jina AI in October 2025 and committed to continuing its products; the Reader repo saw a re-platform in 2025 and its open-source branch was re-synced with the SaaS code in April 2026. Note the repo is the stateless core — the SaaS storage layer isn't included, so self-hosting isn't full feature parity.
Related
- Getting Web Data into AI Agents: Search & Scraping APIs ComparedThe agent web-data layer — Exa for semantic search, Firecrawl for extraction at scale, Tavily for all-in-one, Jina Reader for zero-setup — and how they compose.
- FirecrawlThe API to search, scrape, and crawl the web for AI — clean Markdown out of any site, LLM-powered extraction, and a first-class MCP server.
- TavilyThe web-access layer for agents — Search, Extract, Crawl, Map, and Research APIs purpose-built for LLMs, behind one key, with a hosted MCP server.
- ExaThe search engine built for AIs — semantic web search, page contents, Websets, and research APIs, plus the ecosystem's most-used search MCP server.
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- Web Research PipelineRun a structured web-research pass on a question: plan the searches, find sources via search APIs, fetch and read the best ones, cross-check claims, and synthesize a cited answer — with source quality and disagreements surfaced honestly. Use for 'research X and tell me what's actually true' tasks that need more than one search and less than a day.