# Fine-Tune vs RAG vs Prompt vs Distill: The 2026 Decision Tree

> When to reach for prompt engineering, RAG, fine-tuning, or distillation — what each actually changes, where each fails, and how to combine them.

Four techniques, different problems — so 'which is best' is the wrong question. Prompt engineering changes behavior through instructions (start here). RAG injects changing or private knowledge at query time. Fine-tuning bakes in consistent behavior or format, not fresh facts. Distillation shrinks a working pipeline for cost. They compose; the skill is matching the technique to the gap.

When a model isn't doing what you need, there are four levers — prompt engineering, RAG, fine-tuning, and distillation — and teams routinely grab the wrong one: fine-tuning to add facts (RAG's job), or building a RAG pipeline to fix a formatting problem (a prompt's job). They aren't competitors ranked by power; they solve **different problems.** Pick by naming the gap, not by reaching for the most sophisticated tool.

## What each one actually changes

- **Prompt engineering** changes *behavior through instructions* — system prompts, few-shot examples, output schemas. It's the cheapest and fastest lever, changes nothing about the model, and is bounded by what the model can already do and what fits in context.
- **RAG** changes *what the model knows at answer time* by retrieving relevant passages and grounding the response in them. It's how you make a model answer from private, changing, or factual data — and cite it. It does **not** change the model's behavior or style. (See [How RAG Actually Works](/guides/concepts/how-rag-works).)
- **Fine-tuning** changes *the model's weights* to internalize a behavior: a consistent format, a tone, a narrow task it otherwise does unreliably, or tool-use patterns. It *can* absorb facts, but it's an unreliable, data-hungry way to do it — fresh knowledge belongs in RAG.
- **Distillation** changes *the cost/latency profile* by transferring a big model's capability into a smaller one (usually by training the small model on the big one's outputs — its generated responses and/or output probability distributions). It's an optimization for a pipeline that already works.

## The decision tree

1. **Always start with prompt engineering.** Better instructions, few-shot examples, and a structured output spec solve a surprising fraction of problems for near-zero cost. Exhaust this before anything else.
2. **Need external, changing, or private knowledge (cited)?** → **RAG.** If the failure is "the model doesn't *know* X" or "X changes," retrieval is the answer, not training.
3. **Need consistent behavior, format, or a narrow skill the model does poorly?** → **Fine-tune.** If, after good prompting, the model is *capable but inconsistent* — drifts from your format, won't hold a tone, fumbles a specialized task — bake it into the weights.
4. **Have a working pipeline that's too slow or expensive at scale?** → **Distill** (or right-size to a smaller model). Only once it works; you can't distill a capability you haven't yet achieved.

> [!TIP]
> The order matters because cost and iteration speed go *up* and reversibility goes *down* as you move down the tree. A prompt change ships in minutes; a fine-tune is a dataset, a training run, an eval, and a deploy. Don't pay for a lower rung until a cheaper one provably can't clear the bar — and measure with an [eval set](/guides/evaluation/write-llm-evals) so "provably" means a number.

## They compose

The framing as a *choice* is a simplification — the strongest systems combine them. A canonical production stack: a **fine-tuned** model that reliably follows your format and tool-use behavior, fed by **RAG** for current knowledge, orchestrated with **prompt engineering**, and later **distilled** to a smaller model once the behavior is locked in. Fine-tuning handles *how*, RAG handles *what*, prompting glues them, distillation makes it cheap.

## Putting it together

Name the gap before you pick the tool: missing capability the model already has latent → **prompt**; missing knowledge → **RAG**; inconsistent behavior/format → **fine-tune**; too slow or costly → **distill**. Climb the tree only as far as the problem forces you, prove each step with evals, and remember they stack.

When the answer is fine-tuning, [Preparing a Fine-Tuning Dataset](/guides/mlops/finetune-dataset-prep) is where the real work is, and the [finetuning-engineer](/agents/data-ai/finetuning-engineer) runs it end to end. When the answer involves running your own model, [Self-Host vs API](/guides/mlops/self-host-vs-api-llm) decides whether that pays off.

---

_Source: https://agentscamp.com/guides/mlops/finetune-vs-rag-vs-prompt — Guide on AgentsCamp._
