Token (LLM)

A token is the basic unit a language model reads and writes — typically a word fragment averaging 3–4 characters of English text. Everything about LLMs is denominated in tokens: pricing, context limits, and speed.

Models don't see letters or words; a tokenizer splits text into pieces from a fixed vocabulary, and the model predicts one token at a time. "Understanding" is a single token; "unfathomable" might be three. The practical conversions: ~100 tokens ≈ 75 English words; code and non-English text usually run denser.

Tokens matter because they're the meter on everything. API pricing is per million input and output tokens (output costing several times more — generation is sequential, reading is parallel). The context window is a token budget. Throughput is tokens per second. So the everyday engineering moves — trimming prompts, caching repeated prefixes, summarizing history — are all token economics; the full playbook is in LLM Cost and Latency Engineering.

Frequently asked questions

How many tokens is a word?

In English, roughly 0.75 words per token — about 100 tokens per 75 words. Common words are single tokens; rare words split into pieces; code, non-English text, and unusual formatting often cost more tokens per character. Exact counts come from the model's own tokenizer.

Why are tokens what I pay for?

Because tokens are what the model actually processes: each one costs compute on the way in (reading your prompt) and on the way out (generating the answer). That's why API pricing is per million input and output tokens, why output tokens cost more, and why trimming context is the most direct cost lever.

Frequently asked questions

Related