Fine-Tuning
Fine-tuning continues training a pretrained model on your own examples, changing its weights to teach durable behavior, format, or domain style.
Fine-tuning is continuing a pretrained model's training on your own dataset, updating its weights so desired behavior becomes part of the model itself rather than something you re-explain in every prompt.
A base model knows language and the world; fine-tuning specializes it — your output format, your tone, your domain's conventions, a narrow task done exactly your way. The modern default is parameter-efficient tuning (LoRA/QLoRA), which trains small adapter matrices instead of all weights, putting real fine-tunes within reach of a single GPU.
The decision that matters comes before any training: is your problem behavior or knowledge? Behavior gaps fine-tune well; knowledge gaps belong in RAG, and one-off instructions belong in the prompt. That decision tree — including when distillation beats both — is mapped in Fine-Tune vs RAG vs Prompt vs Distill. And the unglamorous truth of the craft: the dataset is the model. Curation, cleaning, and eval splits (the playbook) determine more of the outcome than any hyperparameter.
Frequently asked questions
- When should I fine-tune instead of using RAG or prompting?
- Fine-tune for behavior, retrieve for knowledge. If the gap is facts the model doesn't have (your docs, fresh data), RAG fixes it without training. If the gap is how the model behaves — a strict output format, a house style, a specialized task it keeps fumbling despite good prompts — fine-tuning encodes that durably. Exhaust prompting first; it's the cheapest experiment.
- Does fine-tuning teach the model new facts?
- Poorly. Weight updates from a modest dataset bias style and behavior effectively but store knowledge unreliably — and the knowledge goes stale the day after training. Facts belong in retrieval; that's why 'fine-tune vs RAG' is usually a false choice and production systems do both: tuned behavior over retrieved knowledge.
Related
- Fine-Tune vs RAG vs Prompt vs Distill: The 2026 Decision TreeWhen to reach for prompt engineering, RAG, fine-tuning, or distillation — what each actually changes, where each fails, and how to combine them.
- LoRA (Low-Rank Adaptation)LoRA fine-tunes a model by training small low-rank adapter matrices instead of all weights — a fraction of the memory and cost, nearly full-tune quality.
- Preparing a Fine-Tuning Dataset: Cleaning, Synthetic Data, and Eval SplitsThe dataset is the model. How to build a fine-tuning dataset that works — format, curation, cleaning, synthetic augmentation, and a leak-free eval split.
- DistillationDistillation trains a smaller model to imitate a larger one — using its outputs as training data to get most of the capability at a fraction of the cost.
- Finetuning EngineerUse this agent to fine-tune an open-weight model end to end — confirming fine-tuning is the right tool, preparing the dataset, choosing the method (LoRA/QLoRA vs. full), running training, and proving the result beats the prompted baseline on a held-out eval set. Examples — "fine-tune a small model to match our support tone and answer format", "we have 800 labeled examples — LoRA-tune and show it beats prompting", "our fine-tune overfits and forgot general ability — fix the data and run".
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- DPO (Direct Preference Optimization)DPO aligns a model to preferences directly from chosen-vs-rejected pairs — no reward model, no RL loop — simpler and more stable than classic RLHF.
- Open WeightsAn open-weights model publishes its parameters for anyone to download and run — unlike API-only models — with licenses from permissive to restricted.
- RLHF (Reinforcement Learning from Human Feedback)RLHF trains a model against human preferences: people rank outputs, a reward model learns the ranking, and the LLM is optimized to produce preferred responses.
- Synthetic DataSynthetic data is training or eval data generated by a model rather than collected from the world — filling gaps, balancing classes, bootstrapping fine-tunes.