Wiki: LLM fine-tuning

Fine-tuning starts where pretraining ends: given a model that already understands language, you update its weights on a narrower dataset to shift its behavior toward a specific task. The mechanics vary considerably by budget and goal.

At the accessible end, Unsloth reduces the compute cost of fine-tuning dramatically, claiming up to 30x faster training and 90% less memory than FlashAttention 2, with tooling to generate datasets from PDFs, CSVs, and JSON without writing code. oobabooga/textgen takes a similar local-first stance, adding LoRA fine-tuning alongside inference in a single offline desktop app.

For teams who want task-specific models without large infrastructure, the BARRED framework covered in Vibe Training uses multi-agent debate to auto-generate verified synthetic training data. The result is a 3B-parameter classifier that outperforms GPT-4.1 on a policy task while costing far less at inference time. This is the central argument for fine-tuning over prompting: a smaller specialized model can beat a larger general one if the training data is good enough.

For those who want to understand what fine-tuning is actually adjusting, raiyanyahya/how-to-train-your-gpt walks through building a modern LLM from scratch with every line commented, grounding the higher-level tooling in the underlying architecture.