Fine-Tuning

Fine-tuning is continuing a pretrained model's training on your own dataset, updating its weights so desired behavior becomes part of the model itself rather than something you re-explain in every prompt.

A base model knows language and the world; fine-tuning specializes it — your output format, your tone, your domain's conventions, a narrow task done exactly your way. The modern default is parameter-efficient tuning (LoRA/QLoRA), which trains small adapter matrices instead of all weights, putting real fine-tunes within reach of a single GPU.

The decision that matters comes before any training: is your problem behavior or knowledge? Behavior gaps fine-tune well; knowledge gaps belong in RAG, and one-off instructions belong in the prompt. That decision tree — including when distillation beats both — is mapped in Fine-Tune vs RAG vs Prompt vs Distill. And the unglamorous truth of the craft: the dataset is the model. Curation, cleaning, and eval splits (the playbook) determine more of the outcome than any hyperparameter.

Frequently asked questions

When should I fine-tune instead of using RAG or prompting?

Fine-tune for behavior, retrieve for knowledge. If the gap is facts the model doesn't have (your docs, fresh data), RAG fixes it without training. If the gap is how the model behaves — a strict output format, a house style, a specialized task it keeps fumbling despite good prompts — fine-tuning encodes that durably. Exhaust prompting first; it's the cheapest experiment.

Does fine-tuning teach the model new facts?

Poorly. Weight updates from a modest dataset bias style and behavior effectively but store knowledge unreliably — and the knowledge goes stale the day after training. Facts belong in retrieval; that's why 'fine-tune vs RAG' is usually a false choice and production systems do both: tuned behavior over retrieved knowledge.

Frequently asked questions

Related