Unsloth
An open-source library that makes LoRA/QLoRA fine-tuning of LLMs roughly 2x faster and far more memory-efficient, so you can fine-tune on a single GPU.
Unsloth is an open-source library that makes fine-tuning open-weight LLMs dramatically faster and lighter on memory. Through hand-optimized kernels and a QLoRA-first design, it cuts training time and VRAM use enough that a fine-tune which would otherwise need a big multi-GPU box runs on a single consumer or cloud GPU — including free Colab notebooks. It's a common default for parameter-efficient fine-tuning when you don't have a cluster.
It is aimed at engineers and researchers doing LoRA/QLoRA fine-tuning who want speed and a small memory footprint without rewriting their training stack. Unsloth integrates with the Hugging Face ecosystem (TRL/PEFT), so it slots into familiar training code.
Highlights
- Faster, lighter fine-tuning — optimized kernels deliver roughly 2x faster training with substantially lower VRAM than a standard setup.
- QLoRA-first — 4-bit base + LoRA adapters so large models fit and train on a single GPU.
- Broad model support — Llama, Mistral, Qwen, Gemma, Phi, and other popular open architectures.
- Hugging Face-native — works with TRL/PEFT and standard datasets, so it drops into existing workflows.
- Ready-made notebooks — free Colab/Kaggle notebooks to fine-tune end to end without local setup.
In an AI-assisted workflow
Load a model in 4-bit, attach LoRA adapters, and train on a prepared dataset:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/llama-3.1-8b-bnb-4bit", load_in_4bit=True, max_seq_length=2048,
)
model = FastLanguageModel.get_peft_model(model, r=16) # LoRA rank
# ...then train with TRL's SFTTrainer on your formatted datasetTIP
Speed doesn't fix data. Unsloth makes the run cheap, but the result is still decided by the dataset — prepare it carefully first (see Preparing a Fine-Tuning Dataset) and drive the run with the QLoRA Fine-Tune Runner.
Good to know
Unsloth's core package is free and open source under Apache-2.0 (the optional Unsloth Studio UI is AGPL-3.0); it targets Linux and Windows with NVIDIA GPUs and runs in hosted notebooks, with an Unsloth Pro/Enterprise option for optimized multi-GPU and multi-node scaling. It handles the training side; for serving the resulting model in production, pair it with vLLM, and for the end-to-end decision and evaluation, the finetuning-engineer.
Related
- Finetuning EngineerUse this agent to fine-tune an open-weight model end to end — confirming fine-tuning is the right tool, preparing the dataset, choosing the method (LoRA/QLoRA vs. full), running training, and proving the result beats the prompted baseline on a held-out eval set. Examples — "fine-tune a small model to match our support tone and answer format", "we have 800 labeled examples — LoRA-tune and show it beats prompting", "our fine-tune overfits and forgot general ability — fix the data and run".
- Qlora Finetune RunnerRun a QLoRA (4-bit LoRA) fine-tune of an open-weight model from a prepared dataset — set up the config, train memory-efficiently (e.g. with Unsloth/PEFT), watch for overfitting, save the adapter, and run a quick eval against the prepared split. Use when you have a clean dataset and want to execute a parameter-efficient fine-tune on a single GPU.
- Preparing a Fine-Tuning Dataset: Cleaning, Synthetic Data, and Eval SplitsThe dataset is the model. How to build a fine-tuning dataset that works — format, curation, cleaning, synthetic augmentation, and a leak-free eval split.
- vLLMA high-throughput, memory-efficient inference and serving engine for LLMs, with PagedAttention, continuous batching, and an OpenAI-compatible API server.