Skip to content
agentscamp
Guide · Advanced

LLM API Pricing in 2026: Every Major Model Compared

Per-million-token prices for Claude, GPT, Gemini, DeepSeek, Mistral, and Grok — plus caching and batch discounts — verified against vendor pricing pages.

4 min readAgentsCamp
Updated Jun 12, 2026
pricingllm-apicostscomparisondata

Verified June 12, 2026, from vendor pricing pages only. Flagships: Claude Fable 5 $10/$50 per million tokens (in/out), Claude Opus 4.8 $5/$25, GPT-5.5 $5/$30, Gemini 3.1 Pro Preview $2/$12. Workhorses: Claude Sonnet 4.6 $3/$15, GPT-5.4 $2.50/$15, Gemini 3.5 Flash $1.50/$9. Caching cuts input ~90%, batch APIs cut everything 50% — the discounts stack.

Key takeaways

  • Three durable price tiers exist across every vendor: frontier ($5–10 in / $25–50 out), workhorse ($1.50–3 in / $9–15 out), and budget ($0.10–1 in / $0.30–5 out) — match the tier to the task before comparing vendors.
  • Output costs 3–6x input everywhere, and reasoning/thinking tokens bill as output — long-output and heavy-reasoning workloads dominate most bills.
  • The discount stack is the real price list: prompt-cache reads run ~0.1x input across vendors, batch APIs take 50% off, and they compose — cached batch input can cost ~5% of list.
  • Open-weights models via hosts (DeepSeek V4, Kimi, Qwen, GLM on Together/Fireworks) undercut proprietary mid-tiers — at the cost of operating the integration yourself.
  • Prices churn — this page is refreshed on a cadence (date stamped above), and every number here was read from a vendor pricing page, never aggregators.

All prices are USD per million tokens, standard tier, read directly from vendor pricing pages on June 12, 2026. Prices change; this page is maintained on a refresh cadence (the Updated date above is the source of truth), and numbers we couldn't verify on a vendor page are omitted, not estimated.

Anthropic (Claude)

ModelInputOutputCache readBatch (in/out)Context
Claude Fable 5$10.00$50.00$1.00$5.00 / $25.001M
Claude Opus 4.8$5.00$25.00$0.50$2.50 / $12.501M
Claude Sonnet 4.6$3.00$15.00$0.30$1.50 / $7.501M
Claude Haiku 4.5$1.00$5.00$0.10$0.50 / $2.50200K

Cache writes cost 1.25x input (5-minute TTL) or 2x (1-hour); cache reads are 0.1x input. Batch is 50% off and stacks with caching. Notably, the 1M-context models include the full window at standard per-token pricing — no long-context surcharge.

OpenAI (GPT)

ModelInputCached inputOutputBatchContext
GPT-5.5$5.00$0.50$30.0050% off1M
GPT-5.5-pro$30.00$180.00~1M
GPT-5.4$2.50$0.25$15.0050% off1M
GPT-5.4-mini$0.75$0.075$4.5050% off400K
GPT-5.4-nano$0.20$0.02$1.2550% off400K

Cached input is 0.1x across the lineup; Flex tier matches batch pricing, Priority tier runs 2.5x. The pro tiers ($30/$180) are the premium-reasoning outliers of the whole market.

Google (Gemini API)

ModelInputOutputCache readBatchContext
Gemini 3.1 Pro Preview$2.00 (≤200K) / $4.00 (>200K)$12.00 / $18.00$0.20 / $0.4050% off~1M
Gemini 3.5 Flash$1.50$9.00 (incl. thinking)$0.1550% off~1M
Gemini 3.1 Flash-Lite$0.25$1.50$0.02550% off~1M

Google is the one major vendor with long-context tiering: the Pro model's per-token price roughly doubles beyond 200K of context. Cache storage bills separately per MTok-hour.

DeepSeek, Mistral, xAI

ModelInputOutputNotes
DeepSeek V4 Flash$0.14$0.28Cache hits ~$0.003; 1M context
DeepSeek V4 Pro$0.435$0.87Cache hits ~$0.004; 1M context
Mistral Medium 3.5$1.50$7.50Premium tier; batch 50%
Mistral Large 3$0.50$1.50Open-weights flagship (not a typo: priced below Medium 3.5)
Mistral Small 4$0.10$0.30Budget tier
Grok 4.3 (xAI)$1.25$2.501M context

DeepSeek remains the proprietary-price disruptor — its V4 Flash undercuts every Western budget tier while claiming a 1M window. (Its legacy deepseek-chat/deepseek-reasoner aliases retire July 24, 2026.)

Open-weights via hosts

Reference serverless prices (host pricing, not vendor MSRP), June 12, 2026 — Together AI / Fireworks:

ModelTogether (in/out)Fireworks (in/out)
DeepSeek V4 Pro$2.10 / $4.40$1.74 / $3.48
Kimi K2.6$1.20 / $4.50$0.95 / $4.00
Qwen 3.6 Plus$0.50 / $3.00$0.50 / $3.00
GLM-5.1$1.40 / $4.40$1.40 / $4.40
GPT-OSS-120B$0.15 / $0.60

Hosted open-weights now sit squarely inside the proprietary mid-tier price band — the lever that keeps the whole market's pricing honest, and the input to any self-host decision.

Reading the table like an engineer

Three structural facts matter more than any single cell. Output dominates: at 3–6x input everywhere — and with reasoning thinking-tokens billed as output — verbose responses and deep deliberation drive bills more than prompt size. The discount stack is enormous: prompt-cache reads at ~0.1x input plus batch at 50% compose to ~95% off for cacheable offline work — engineering for the stack beats switching vendors. Tiers beat brands: every vendor offers frontier/workhorse/budget rungs; matching the tier to the task and measuring cost per completed task (not per token) is where the money actually is — the full playbook is LLM Cost and Latency Engineering.

Frequently asked questions

What's the cheapest good LLM API in 2026?
Depends on the floor you need. For mechanical work, GPT-5.4-nano ($0.20/$1.25), Mistral Small 4 ($0.10/$0.30), and Gemini 3.1 Flash-Lite ($0.25/$1.50) define the budget tier; Claude Haiku 4.5 ($1/$5) is the premium-budget pick. For workhorse coding/agent use, Sonnet 4.6, GPT-5.4, and Gemini 3.5 Flash cluster at $1.50–3 in / $9–15 out — capability per dollar there is task-dependent: benchmark on your work.
How do prompt caching and batch discounts actually combine?
Multiplicatively, on most platforms. Example on Claude: batch halves list price, and cache reads are 0.1x input — Anthropic's batch+cache pricing means a cached input token in a batch job costs ~5% of the standard input rate. Structuring for cacheable prefixes and routing offline work to batch APIs are the two highest-ROI cost moves before touching model choice.
Do 1M-token context windows cost extra?
On Anthropic, no — Fable 5, Opus 4.8, and Sonnet 4.6 include the full 1M window at standard per-token pricing (a 900k-token request bills at the same rate as a 9k one). Google tiers instead: Gemini 3.1 Pro charges roughly double per token beyond 200K context. Either way you pay for the tokens you send — big windows make big bills possible, not free.
Why do you only cite vendor pricing pages?
Because third-party price aggregators drift stale within weeks and propagate each other's errors. Every figure on this page was fetched from the provider's own pricing or docs page on the date stamped above; anything we couldn't verify that way is omitted rather than guessed.

Related