Qlora Finetune Runner
Run a QLoRA (4-bit LoRA) fine-tune of an open-weight model from a prepared dataset — set up the config, train memory-efficiently (e.g. with Unsloth/PEFT), watch for overfitting, save the adapter, and run a quick eval against the prepared split. Use when you have a clean dataset and want to execute a parameter-efficient fine-tune on a single GPU.
Install to ~/.claude/skills/qlora-finetune-runner/SKILL.md
Executes a QLoRA fine-tune from a prepared dataset: it sets a sane 4-bit + LoRA config (rank/alpha/target modules, LR, epochs), trains memory-efficiently on a single GPU, watches validation loss for overfitting, saves the adapter, and runs a quick eval against the held-out split — the training run, done reproducibly.
QLoRA makes fine-tuning a real option on one modest GPU: it quantizes the base model to 4-bit and trains only small LoRA adapters on top, so a model that wouldn't fit in memory for full fine-tuning trains comfortably. This skill executes that run from a prepared dataset — it sets a sensible config, trains, watches for the failure modes, saves the adapter, and sanity-checks the result, reproducibly.
When to use this skill
- You have a cleaned, split dataset (see the Fine-Tune Dataset Builder) and want to run a parameter-efficient fine-tune.
- Fine-tuning on a single GPU where full fine-tuning won't fit — QLoRA's 4-bit base makes it possible.
- Iterating on a fine-tune: adjusting LoRA rank, learning rate, or epochs and re-running cleanly.
Instructions
- Verify the dataset is ready. Confirm it's in the trainer's format (typically JSONL chat/instruction records), deduped, and has a held-out eval split that does not overlap training. If it isn't prepared, stop and prepare it first — the run is only as good as the data. (See Preparing a Fine-Tuning Dataset.)
- Detect the environment. Check the GPU/VRAM, framework, and whether Unsloth, TRL/PEFT, or another trainer is in use; match the project's existing setup rather than introducing a new stack.
- Set a sane QLoRA config. 4-bit base (NF4), LoRA on the attention (and often MLP) projection modules, a modest rank (e.g. 8–32) and matching alpha, a low learning rate, and a small number of epochs (1–3 — more usually overfits). State each choice; they're the knobs you'll tune.
- Train with the eval split wired in. Run training with periodic evaluation on the held-out split so you can see validation loss, not just training loss. Keep it reproducible: fixed seed, logged config, recorded dataset version.
- Watch for the failure modes. Stop or adjust if validation loss climbs while training loss falls (overfitting), or if outputs lose general ability (catastrophic forgetting). The fix is usually fewer epochs or better data, not a bigger rank.
- Save the adapter and sanity-check. Save the LoRA adapter (and a merged model if you'll serve it merged), then run a quick eval on held-out examples to confirm the behavior changed in the intended direction before handing off to a full evaluation.
WARNING
Training loss going down means the model is memorizing, not that it's good. Always evaluate on the held-out split — and if it never saw a real eval set, this run can't be trusted regardless of how clean the loss curve looks.
NOTE
QLoRA's 4-bit quantization is for fitting the base model in memory during training; it's separate from quantizing the final model for serving. Note which you mean, and re-check quality after any serving-time quantization.
Output
A trained LoRA adapter plus the run's record: the QLoRA config (quantization, rank/alpha, target modules, LR, epochs, seed), the training/validation loss curves, the dataset version, and a quick held-out eval confirming the intended behavior change — reproducible enough to re-run or tune. Full evaluation and the ship/no-ship call belong to the finetuning-engineer.
Related
- UnslothAn open-source library that makes LoRA/QLoRA fine-tuning of LLMs roughly 2x faster and far more memory-efficient, so you can fine-tune on a single GPU.
- Finetuning EngineerUse this agent to fine-tune an open-weight model end to end — confirming fine-tuning is the right tool, preparing the dataset, choosing the method (LoRA/QLoRA vs. full), running training, and proving the result beats the prompted baseline on a held-out eval set. Examples — "fine-tune a small model to match our support tone and answer format", "we have 800 labeled examples — LoRA-tune and show it beats prompting", "our fine-tune overfits and forgot general ability — fix the data and run".
- Finetune Dataset BuilderTurn raw examples into a training-ready fine-tuning dataset — normalize to the trainer's chat/instruction format, deduplicate (including near-duplicates), strip PII, balance, validate the schema and token lengths, and carve a leak-free eval split. Use when you have raw examples and need a clean, formatted, split dataset before training.
- Preparing a Fine-Tuning Dataset: Cleaning, Synthetic Data, and Eval SplitsThe dataset is the model. How to build a fine-tuning dataset that works — format, curation, cleaning, synthetic augmentation, and a leak-free eval split.