Firecrawl
The API to search, scrape, and crawl the web for AI — clean Markdown out of any site, LLM-powered extraction, and a first-class MCP server.
Firecrawl (~131k GitHub stars) turns the messy web into agent-ready data: /scrape renders any page to clean Markdown, /crawl walks whole sites, /map discovers URLs, /search queries the web, and /extract pulls structured data with an LLM. Open-source core (AGPL-3.0) with a hosted API, and an MIT MCP server installable into Claude Code as a hosted remote or local npx server.
Firecrawl is the ingestion workhorse of the agent stack: give it a URL and get back clean Markdown; give it a domain and get back the whole site, crawled and converted. At ~131k GitHub stars it has become the default answer to "how do I get web content into my LLM pipeline without writing a scraper per site."
Highlights
/scrape— any page to clean Markdown or JSON, JavaScript rendering included./crawl+/map— walk entire sites with depth/limit controls, or just discover the URL tree fast./search— web search with optional content scraping of the results in one call./extract— LLM-powered structured extraction: define a schema, get validated objects from messy pages.- Agent-grade MCP server — 14 tools including scrape/map/search/crawl, extraction, and newer agent/browser-session tools; hosted or local.
- Open core — AGPL-3.0, self-hostable; the hosted cloud adds managed scale and the proprietary Fire-Engine.
In an AI-assisted workflow
claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp
# then:
# > Crawl docs.example.com, extract every API endpoint and its auth requirements
# > into a table, and flag the ones our client doesn't implement yetFor RAG ingestion, Firecrawl is the step before chunking: site → clean Markdown → chunks → embeddings, without the per-site parser zoo.
WARNING
Two operational cautions: the hosted MCP URL embeds your API key in the path — treat the URL itself as a secret — and scraped content is untrusted input to your model (the classic indirect prompt-injection vector). Respect target sites' policies; Firecrawl's own terms put that responsibility on you.
Good to know
Freemium: a monthly free credit allowance (no card), then plans metered in page credits; credits don't roll over. The company raised a $14.5M Series A (Nexus, with Y Combinator) alongside the v2 API in August 2025, and the GitHub org renamed from mendableai to firecrawl. Pair with Exa — search to find pages, Firecrawl to extract them — for the full web-data layer under an agent.
Frequently asked questions
- What does Firecrawl do that plain fetching doesn't?
- It handles the web's hostile parts — JavaScript rendering, anti-bot friction, pagination, layout noise — and returns clean Markdown or structured JSON ready for an LLM. One endpoint scrapes a page; /crawl does entire sites with depth and limit controls; /extract turns 'get every product's name and price' into a schema-validated result.
- How do I add Firecrawl to Claude Code?
- Two documented options. Local: claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-key -- npx -y firecrawl-mcp. Hosted remote: claude mcp add --transport http firecrawl https://mcp.firecrawl.dev/your-api-key/v2/mcp — note the key is embedded in that URL, so treat the whole URL as a secret.
- Is Firecrawl open source?
- The core is AGPL-3.0 and self-hostable (SDKs and some components are MIT, as is the MCP server). The hosted cloud adds proprietary niceties like Fire-Engine. AGPL matters if you modify and operate it as a service — most teams just use the hosted API with its free monthly credits.
Related
- The Best MCP Servers in 2026The MCP servers actually worth connecting in 2026 — Context7, GitHub, Chrome DevTools, Playwright, Serena, Exa, Firecrawl, and the best official vendor servers, by use case.
- Adding MCP Servers to Claude Code: Local, Remote, and Project-ScopedThe complete claude mcp add reference — stdio vs HTTP transports, local/project/user scopes, .mcp.json with env expansion, OAuth via /mcp, and the gotchas.
- ExaThe search engine built for AIs — semantic web search, page contents, Websets, and research APIs, plus the ecosystem's most-used search MCP server.
- Data EngineerUse this agent to build and maintain data pipelines — ingestion, ELT/ETL, warehouse modeling, orchestration, and data-quality tests. Examples — building an idempotent ingestion job, modeling a fact/dimension table in dbt, writing a safe backfill for a changed schema.
- How RAG Actually Works: Ingestion, Chunking, Retrieval & RerankingA clear, practical walkthrough of the retrieval-augmented generation pipeline — what each stage does, where it fails, and how the pieces fit together.
- Chunking Strategy OptimizerFind the chunking strategy and size that maximizes retrieval quality for a specific corpus, by sweeping configurations against a fixed eval set instead of guessing. Use when RAG answers miss obvious content, when standing up a new corpus, or when picking chunk size/overlap.
- Web Research PipelineRun a structured web-research pass on a question: plan the searches, find sources via search APIs, fetch and read the best ones, cross-check claims, and synthesize a cited answer — with source quality and disagreements surfaced honestly. Use for 'research X and tell me what's actually true' tasks that need more than one search and less than a day.
- Getting Web Data into AI Agents: Search & Scraping APIs ComparedThe agent web-data layer — Exa for semantic search, Firecrawl for extraction at scale, Tavily for all-in-one, Jina Reader for zero-setup — and how they compose.
- Jina ReaderPrepend r.jina.ai/ to any URL and get LLM-ready markdown — JS rendering, PDFs and Office docs, image captioning, and s.jina.ai for read-the-results search.
- TavilyThe web-access layer for agents — Search, Extract, Crawl, Map, and Research APIs purpose-built for LLMs, behind one key, with a hosted MCP server.