Skip to content
agentscamp

Data & ML — AI Agents, Skills & Tools

Agents, skills, guides, tools, and commands for data & ml — 30 curated resources for building with AI coding agents.

Agent

Data Engineer

Use this agent to build and maintain data pipelines — ingestion, ELT/ETL, warehouse modeling, orchestration, and data-quality tests. Examples — building an idempotent ingestion job, modeling a fact/dimension table in dbt, writing a safe backfill for a changed schema.

sonnet6
Agent

Data Scientist

Use this agent for data analysis — exploration, statistics, SQL, and clear findings. Examples — analyzing a dataset, writing an analytical SQL query, summarizing experiment results.

sonnet
Agent

ML Engineer

Use this agent for production ML — pipelines, training, serving, evaluation, and MLOps. Examples — building a training pipeline, deploying a model, setting up evaluation.

opus
Agent

Postgres Migration Engineer

Use this agent to plan and execute a zero-downtime Postgres schema migration — decomposing a breaking change into expand-contract steps, writing batched backfills, building indexes CONCURRENTLY, validating constraints online, and keeping every step reversible with the project's migration tooling. Examples — "add a NOT NULL column to a 200M-row table without downtime", "rename a column safely across a rolling deploy", "split this risky migration into reversible expand/contract steps".

sonnet6
Agent

Prompt Engineer

Use this agent to design and iterate the prompts behind an LLM-powered product feature — instructions, few-shot examples, tool schemas, and the evals that prove a change actually helped. Examples — "this classification prompt is flaky, make it reliable", "design the system prompt and function schema for our support agent", "our extraction prompt regressed after I tweaked it, set up evals so this stops happening".

sonnet6
Agent

Vector Search Engineer

Use this agent to design, build, and tune the vector-database layer of a search or RAG system — schema and index design (HNSW/IVF + quantization), metadata/payload filtering, hybrid (dense + sparse) search, and ingestion/upsert pipelines — sized to a real latency, recall, and cost budget. Examples — "set up pgvector for our docs with HNSW and filtered search", "our Qdrant queries are slow and recall dropped after quantization", "add metadata filtering so search only returns the current tenant's documents".

sonnet6
Agent

SQL Pro

Use this agent for SQL itself — correct joins and window functions, indexing, EXPLAIN plans, schema design, and safe migrations on Postgres/MySQL. Examples — making a slow query fast, designing a normalized schema, writing a reversible migration.

sonnet6
Skill

Multimodal Document Extractor

Extract structured data from documents and images with a vision-language model — define the target schema, prompt the VLM to fill it from the page (invoices, forms, receipts, statements, IDs), and verify critical fields against the source. Use when you need reliable structured output from messy, varied, or scanned documents that defeat template-based OCR.

invocablev1.0.0
Skill

SQL Optimizer

Diagnose a slow SQL query from its execution plan and propose a verified optimization — finding the real bottleneck (sequential scan, missing or unused index, bad join order, app-side N+1) and measuring the fix before and after. Use when a query is slow and you need a fix backed by EXPLAIN ANALYZE, not a guess.

invocablev1.0.0
Skill

Embedding Index Tuner

Tune a vector index — HNSW graph parameters and quantization — to hit a recall target at the lowest latency and memory, by sweeping settings against a fixed query set instead of trusting defaults. Use when vector search is slow or memory-hungry, when recall dropped after enabling quantization, or when standing up an index and you need defensible parameters.

invocablev1.0.0
Skill

Postgres Index Strategist

Recommend the right Postgres index for a query or workload — choosing B-Tree vs. GIN vs. BRIN vs. partial/covering/expression, checking for redundant or unused indexes, and verifying the choice against the query plan. Use when a query needs an index, when deciding an index type for jsonb/array/full-text/time-series data, or when auditing an over-indexed table.

invocablev1.0.0
Guide

Choosing Embeddings in 2026: OpenAI vs Cohere vs Voyage vs Open-Source

A decision guide for picking an embedding model for retrieval — accuracy, dimensions, cost, multilingual and domain fit, self-hosting, and lock-in.

4m read· AgentsCamp
Guide

Best Vector Database in 2026: pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus vs Chroma vs LanceDB

A decision guide to vector databases — embedded, server, or managed; whether you already run Postgres; and which fits your scale, filtering, and RAG needs.

5m read· AgentsCamp
Guide

Indexing Postgres at Scale: B-Tree vs GIN vs BRIN and the Hidden Cost of Over-Indexing

A practical guide to choosing Postgres index types — B-Tree, GIN, BRIN, partial, and covering — and why every index you add taxes every write.

4m read· AgentsCamp
Guide

Zero-Downtime Postgres Migrations: The Expand-Contract Playbook for 2026

How to change a live Postgres schema without downtime or broken deploys — the expand-contract pattern, safe column changes, batched backfills, and CONCURRENTLY.

5m read· AgentsCamp
Guide

Using Vision-Language Models for OCR, Documents, and Video Understanding

How to use vision-language models for OCR, documents, and video: how they differ from traditional OCR, their failure modes, and getting reliable output.

2m read· AgentsCamp
Tool

Chroma

An open-source, Python-first vector database that runs in-process — the fastest path from pip install to a working retrieval prototype.

open sourcesdk
Tool

LanceDB

An open-source embedded vector database built on the Lance columnar format — serverless, multimodal, and designed to scale on local disk or object storage.

open sourcesdk
Tool

Mem0

A memory layer for AI agents and apps — persistent, personalized long-term memory across sessions.

open sourcesdk
Tool

Milvus

An open-source vector database built for billion-scale similarity search, with a distributed architecture and a wide menu of index types.

open sourceplatform
Tool

pgroll

An open-source CLI for zero-downtime, reversible Postgres schema migrations using the expand-contract pattern behind versioned schema views.

open sourcecli
Tool

pgvector

An open-source Postgres extension that adds a vector type and HNSW/IVFFlat indexes for similarity search inside your existing database.

open sourcesdk
Tool

Pinecone

A fully managed, serverless vector database for similarity search and RAG — no nodes to run, indexes to tune, or infrastructure to operate.

freemiumplatform
Tool

Qdrant

An open-source vector database written in Rust, built for low-latency similarity search at scale.

open sourceplatform
Tool

Qwen3-VL

Alibaba Qwen's open-weights vision-language model family (2B–235B, Apache-2.0): image and document understanding, OCR, visual reasoning, and video.

open sourceplatform
Tool

Voyage AI

Embedding and reranking models tuned for retrieval, now part of MongoDB.

freemiumplatform
Tool

Weaviate

An open-source vector database with built-in hybrid search, pluggable vectorizer modules, and GraphQL/REST/gRPC APIs.

open sourceplatform
Command

DB Migrate

Generate and apply a database migration the safe way — using the project's migration tool, with expand-contract discipline for breaking changes, lock-free DDL, and a reversible up/down.

/db-migrate<the schema change to make, or a path to a pending migration to review>
Command

Scaffold a pgvector Schema & HNSW Index

Scaffold a production-ready pgvector schema and HNSW index for a corpus — matching the project's migration tooling, distance metric, and embedding dimensions.

/scaffold-pgvector-schema<table/corpus name and embedding dimensions, or a description of the data>
Command

Profile Postgres Queries

Profile a Postgres workload to find the queries actually costing you — rank by total time with pg_stat_statements, EXPLAIN the worst offenders, and recommend the highest-leverage fix.

/profile-postgres-queries<database/connection details, a slow endpoint, or a description of the workload>