Tree of Thoughts
Tree of Thoughts is a prompting method that explores multiple reasoning branches as a search tree, evaluating and backtracking among them.
Tree of Thoughts is a prompting and search method that generalizes chain-of-thought into a branching tree: the model generates multiple candidate reasoning steps, evaluates them, and explores or backtracks among branches to reach a solution.
Where chain-of-thought commits to one linear sequence, Tree of Thoughts treats reasoning as a search problem. At each step it proposes several possible "thoughts," scores how promising each is, and expands the best ones — depth-first or breadth-first — discarding dead ends. Because it can abandon a bad path and try another, it outperforms linear prompting on tasks that need lookahead and exploration, like puzzles and multi-step planning where the first idea is frequently wrong.
The cost is the catch. Exploring a tree means many more model calls and far more tokens than a single pass, which is a deliberate spend of test-time compute — trading inference budget for accuracy. That tradeoff also overlaps with what a reasoning model does internally, so before orchestrating an explicit tree, it's worth checking whether a reasoning model already gives you enough exploration for far less plumbing.
Frequently asked questions
- How is Tree of Thoughts different from chain-of-thought?
- Chain-of-thought produces one linear sequence of reasoning steps. Tree of Thoughts generalizes that into a search: it generates several candidate next steps at each point, scores them, and explores or abandons branches — so it can recover from a bad step instead of committing to a single line. It trades far more compute for the ability to backtrack.
- When is it worth the extra cost?
- On problems that need exploration and lookahead — puzzles, planning, math with multiple viable paths — where the first idea is often wrong. For straightforward tasks the overhead (many more model calls and tokens) buys little, and a plain chain-of-thought or a reasoning model is cheaper and good enough.
Related
- Chain-of-Thought (CoT)Chain-of-thought prompting has a model work through intermediate reasoning steps before answering — improving accuracy on multi-step problems.
- Reasoning ModelA reasoning model is an LLM trained to think before answering — generating internal reasoning tokens it can spend adaptively on hard problems.
- Test-Time ComputeTest-time compute is spending more computation at inference — longer reasoning, sampling, or search — to improve answers without retraining the model.