# LoRA (Low-Rank Adaptation)

> LoRA fine-tunes a model by training small low-rank adapter matrices instead of all weights — a fraction of the memory and cost, nearly full-tune quality.

**LoRA (Low-Rank Adaptation) is the technique that made fine-tuning affordable: instead of updating a model's billions of weights, it freezes them and trains small low-rank matrices injected alongside — typically well under 1% of parameters — capturing most of the quality of a full fine-tune.**

The insight is that the *change* a fine-tune needs is low-rank: representable as the product of two thin matrices per adapted layer. Training only those slashes GPU memory (no optimizer state for frozen weights), produces megabyte-scale adapter artifacts instead of full model copies, and lets one base model serve many tasks by swapping adapters. **QLoRA** stacks [quantization](/glossary/quantization) underneath — a 4-bit frozen base with trainable adapters — bringing 7–70B-class [fine-tuning](/glossary/fine-tuning) onto single GPUs.

In practice LoRA/QLoRA is the default for open-weight model tuning, with libraries like [Unsloth](/tools/unsloth) optimizing the loop. The end-to-end procedure — dataset to adapter to eval — is packaged in the [qlora-finetune-runner](/skills/data/qlora-finetune-runner) skill; whether you should be fine-tuning at all is the [decision-tree guide](/guides/mlops/finetune-vs-rag-vs-prompt)'s question.

---

_Source: https://agentscamp.com/glossary/lora — Term on AgentsCamp._