Stagehand
Browserbase's open-source SDK for browser agents — act, extract, observe, and agent primitives that mix natural language with code-level control.
Stagehand (MIT, ~23k stars, by Browserbase) is the engineer's browser-agent SDK: four primitives — act() for natural-language actions that survive redesigns, extract() with Zod-validated schemas, observe() to preview actionable elements, agent() for full autonomy — composable with ordinary code. TypeScript-first, Python too; Browserbase cloud optional.
Stagehand is the browser-agent framework built the way engineers wish AI tools were built: deterministic code by default, intelligence exactly where brittleness lives. Instead of handing the whole task to an agent, you compose four primitives — and each one earns its place.
Highlights
act()— natural-language actions ("click the submit button") resolved against the live page, surviving the redesigns that break selectors.extract()— structured data out, validated against a Zod schema: structured output discipline applied to scraping.observe()— preview what's actionable on a page before committing, the look-before-you-leap primitive.agent()— full multi-step autonomy when you want it, model-agnostic (pairs with any LLM or computer-use model).- v3 architecture — native CDP driver layer (Playwright removed), self-healing execution, and action caching that avoids repeat inference on known pages.
- Two languages — TypeScript flagship, Python SDK alongside;
npx create-browser-appscaffolds a running project.
In an AI-assisted workflow
npx create-browser-app # scaffold + run locally; OPENAI_API_KEY (or similar) requiredThe sweet spot is reliable automations with AI joints: a checkout flow that's 90% ordinary code and 10% act() where the DOM churns; an extraction pipeline whose schema is enforced, not hoped for. For one-shot autonomous errands, Browser Use's task-in/result-out model is less ceremony; the trade is exactly control versus convenience.
TIP
The caching matters in production: actions Stagehand has resolved once replay without LLM calls until the page changes — turning per-step model costs from a constant into an amortized one.
Good to know
MIT, from Browserbase (whose $40M Series B, June 2025, funds the cloud layer: hosted browsers, recordings, stealth, proxies — optional, paid, and where scale lives). v2-era content predates the Playwright removal — check versions when following tutorials. Field positioning against Browser Use, Skyvern, and the MCP-based options: Browser Agents in 2026.
Frequently asked questions
- How is Stagehand different from Browser Use?
- Control philosophy. Browser Use leads with full autonomy — one task string, the agent figures it out. Stagehand leads with composable primitives: you write real code and drop to natural language exactly where selectors would be brittle (act('click the login button')), with extract() returning schema-validated data. Determinism where you want it, AI where you need it.
- Do I need Browserbase to use Stagehand?
- No — the SDK is MIT and runs against local Chromium with just a model API key. Browserbase's cloud browsers are the optional production layer (scale, session recording, stealth, proxies). The npx create-browser-app scaffold runs locally out of the box.
- What changed in Stagehand v3?
- A re-architecture: Playwright was removed entirely in favor of running natively at the CDP protocol level with a modular driver system, plus self-healing execution and caching of discovered actions to cut repeat LLM calls — Browserbase claims ~44% faster interactions across iframes and shadow roots. v2 configs and tutorials don't carry over; there's a migration guide.
Related
- Browser Agents in 2026: Browser Use vs Stagehand vs Skyvern vs Playwright MCPThe four ways to give AI a browser — autonomous framework, code-first SDK, workflow platform, or MCP server — compared honestly by control, cost, and reliability.
- Browser UseThe most-adopted open-source browser-agent framework — point an LLM at a task and it drives a real browser: navigating, clicking, typing, extracting.
- SkyvernOpen-source vision + LLM browser automation aimed at replacing brittle RPA — workflow builder, CAPTCHA/2FA handling, and self-host or cloud.
- How Computer-Use Agents WorkInside the perception-action loop that lets AI operate real software — screenshots in, clicks out — plus grounding, reliability, and when to use APIs instead.
- Playwright MCPMicrosoft's open-source MCP server that gives AI agents structured browser automation via Playwright's accessibility tree.
- Structured OutputStructured output makes an LLM return data in a guaranteed shape — JSON matching your schema — so code can consume model responses without parsing prose.
- Browser Agent EngineerUse this agent to build, harden, or debug browser-automation agents — web tasks via Browser Use, Stagehand, Skyvern, or Playwright-based stacks. Examples: automate a portal workflow, make a flaky browser agent reliable, add verification and guardrails to web automation, choose between vision and DOM grounding.