Skip to content
agentscamp

MLOps & AI Infra — AI Agents, Skills & Tools

Agents, skills, guides, tools, and commands for mlops & ai infra — 19 curated resources for building with AI coding agents.

Agent

Finetuning Engineer

Use this agent to fine-tune an open-weight model end to end — confirming fine-tuning is the right tool, preparing the dataset, choosing the method (LoRA/QLoRA vs. full), running training, and proving the result beats the prompted baseline on a held-out eval set. Examples — "fine-tune a small model to match our support tone and answer format", "we have 800 labeled examples — LoRA-tune and show it beats prompting", "our fine-tune overfits and forgot general ability — fix the data and run".

sonnet6
Agent

LLM Inference Engineer

Use this agent to serve and optimize self-hosted LLM inference — sizing GPUs, configuring a serving engine like vLLM (continuous batching, PagedAttention, tensor parallelism), applying quantization, and tuning throughput and tail latency against a cost and p95 budget. Examples — "serve Llama-3-70B at p95 under 2s on our GPUs", "our self-hosted model is slow and the GPUs sit half-idle — raise throughput", "quantize this model to fit one GPU without wrecking quality".

sonnet6
Agent

Voice Agent Engineer

Use this agent to build or fix a real-time voice agent — the streaming STT → LLM → TTS pipeline, conversational (mouth-to-ear) latency, turn-taking, barge-in/interruptions, and per-stage provider selection. Examples — "our voice bot feels laggy and talks over people, fix the turn-taking and latency", "build a phone agent that transcribes, answers with our LLM, and speaks back", "get our voice agent's response time under a second".

sonnet6
Skill

Finetune Dataset Builder

Turn raw examples into a training-ready fine-tuning dataset — normalize to the trainer's chat/instruction format, deduplicate (including near-duplicates), strip PII, balance, validate the schema and token lengths, and carve a leak-free eval split. Use when you have raw examples and need a clean, formatted, split dataset before training.

invocablev1.0.0
Skill

Qlora Finetune Runner

Run a QLoRA (4-bit LoRA) fine-tune of an open-weight model from a prepared dataset — set up the config, train memory-efficiently (e.g. with Unsloth/PEFT), watch for overfitting, save the adapter, and run a quick eval against the prepared split. Use when you have a clean dataset and want to execute a parameter-efficient fine-tune on a single GPU.

invocablev1.0.0
Guide

Preparing a Fine-Tuning Dataset: Cleaning, Synthetic Data, and Eval Splits

The dataset is the model. How to build a fine-tuning dataset that works — format, curation, cleaning, synthetic augmentation, and a leak-free eval split.

3m read· AgentsCamp
Guide

Fine-Tune vs RAG vs Prompt vs Distill: The 2026 Decision Tree

When to reach for prompt engineering, RAG, fine-tuning, or distillation — what each actually changes, where each fails, and how to combine them.

3m read· AgentsCamp
Guide

Self-Host vs API: When Does Running Your Own LLM Actually Pay Off?

The real economics of self-hosting an LLM vs. calling a hosted API — GPU utilization, privacy, latency, and the hidden ops costs that decide the crossover.

3m read· AgentsCamp
Guide

Using Vision-Language Models for OCR, Documents, and Video Understanding

How to use vision-language models for OCR, documents, and video: how they differ from traditional OCR, their failure modes, and getting reliable output.

2m read· AgentsCamp
Guide

How to Build a Voice Agent: The STT → LLM → TTS Pipeline

How to build a real-time voice agent: the STT → LLM → TTS pipeline, the latency budget that makes or breaks it, and how to wire each stage.

3m read· AgentsCamp
Tool

Deepgram

A voice-AI platform with fast, accurate speech-to-text (Nova) and low-latency text-to-speech (Aura), plus a bundled Voice Agent API.

freemiumplatform
Tool

ElevenLabs

A voice-AI platform for high-quality text-to-speech, voice cloning, dubbing, and real-time conversational agents, via API.

freemiumplatform
Tool

LM Studio

A desktop app for discovering, downloading, and running open-weight LLMs locally with a GUI and a local OpenAI-compatible server.

freemiumplatform
Tool

Ollama

An open-source tool to run open-weight LLMs locally with a single command, including a local OpenAI-compatible API.

open sourcecli
Tool

Pipecat

An open-source Python framework for real-time voice and multimodal conversational AI — it orchestrates streaming STT, LLM, and TTS into composable pipelines.

open sourcesdk
Tool

Qwen3-VL

Alibaba Qwen's open-weights vision-language model family (2B–235B, Apache-2.0): image and document understanding, OCR, visual reasoning, and video.

open sourceplatform
Tool

Unsloth

An open-source library that makes LoRA/QLoRA fine-tuning of LLMs roughly 2x faster and far more memory-efficient, so you can fine-tune on a single GPU.

open sourcesdk
Tool

vLLM

A high-throughput, memory-efficient inference and serving engine for LLMs, with PagedAttention, continuous batching, and an OpenAI-compatible API server.

open sourcesdk
Command

Scaffold a vLLM Serving Config

Scaffold a vLLM serving config for a model on a target GPU — pick precision/quantization and parallelism to fit, set batching and context length, and expose an OpenAI-compatible server.

/scaffold-vllm-config<model + target GPU(s) and VRAM, or a description of the serving workload>