RAG & Retrieval — AI Agents, Skills & Tools

Agents, skills, guides, tools, and commands for rag & retrieval — 57 curated resources for building with AI coding agents.

Agent

Rag Pipeline Engineer

Use this agent to design, build, and harden a production retrieval-augmented generation (RAG) pipeline end to end — ingestion, chunking, embeddings, indexing, retrieval, reranking, and grounded generation — with evals that prove each stage works. Examples — "stand up RAG over our docs", "our RAG hallucinates and misses obvious answers, fix the pipeline", "take our prototype RAG to production with evals and citations".

sonnet6

Agent

Retrieval Engineer

Use this agent to raise the retrieval quality of a search or RAG system — recall and precision, hybrid (dense + sparse) search, reranking, query transformation, and metadata filtering — measured against a labeled eval set. Examples — "our RAG retrieves irrelevant chunks, fix recall", "add hybrid search and reranking and prove it helps", "queries with acronyms/IDs return nothing, fix it".

sonnet6

Agent

Vector Search Engineer

Use this agent to design, build, and tune the vector-database layer of a search or RAG system — schema and index design (HNSW/IVF + quantization), metadata/payload filtering, hybrid (dense + sparse) search, and ingestion/upsert pipelines — sized to a real latency, recall, and cost budget. Examples — "set up pgvector for our docs with HNSW and filtered search", "our Qdrant queries are slow and recall dropped after quantization", "add metadata filtering so search only returns the current tenant's documents".

sonnet6

Skill

Chunking Strategy Optimizer

Find the chunking strategy and size that maximizes retrieval quality for a specific corpus, by sweeping configurations against a fixed eval set instead of guessing. Use when RAG answers miss obvious content, when standing up a new corpus, or when picking chunk size/overlap.

invocablev1.0.0

Skill

Embedding Set Inspector

Diagnose the health of an embedding set before blaming the retriever — checking normalization, dimensionality, near-duplicates, degenerate vectors, and corpus/query distribution mismatch. Use when retrieval quality is poor, after a re-embed, or before shipping a new index.

invocablev1.0.0

Skill

Graphrag Scaffolder

Stand up a GraphRAG experiment the disciplined way: audit whether your failed queries are actually connection-shaped, scope a minimal entity/relationship ontology, build extraction → graph → community-summary indexing on a corpus slice, and measure against vector-RAG baselines before committing. Use when multi-hop or whole-corpus questions keep failing plain RAG.

invocablev1.0.0

Skill

Embedding Index Tuner

Tune a vector index — HNSW graph parameters and quantization — to hit a recall target at the lowest latency and memory, by sweeping settings against a fixed query set instead of trusting defaults. Use when vector search is slow or memory-hungry, when recall dropped after enabling quantization, or when standing up an index and you need defensible parameters.

invocablev1.0.0

Guide

Best RAG Frameworks in 2026

A roundup of the top RAG frameworks in 2026 — LlamaIndex, LangChain, Haystack, and DSPy — and which one fits your retrieval stack.

4m read· AgentsCamp

Guide

Exa vs Tavily: Web Search APIs for AI Agents (2026)

Exa vs Tavily compared — neural semantic discovery vs agent-optimized RAG answers, pricing, MCP support, and which web search API fits your stack.

3m read· AgentsCamp

Guide

LangChain vs LlamaIndex in 2026: Agents or Data?

The classic framework confusion resolved — LangChain's agent loop and ecosystem vs LlamaIndex's data-and-documents depth — and when you'd genuinely use both.

2m read· AgentsCamp

Guide

pgvector vs Pinecone: Do You Need a Vector Database? (2026)

pgvector vs Pinecone compared — vector search inside the Postgres you already run vs a dedicated managed service. Scale thresholds, ops, and the honest default.

2m read· AgentsCamp

Guide

Qdrant vs Pinecone: Which Vector Database? (2026)

Qdrant vs Pinecone compared — open-source control vs fully managed serverless, filtering and hybrid search, cost shape, and which fits your RAG stack.

2m read· AgentsCamp

Guide

Weaviate vs Pinecone: Open-Source vs Managed Vector DB (2026)

Weaviate vs Pinecone — BSD-3 open source you self-host vs fully managed serverless. Hybrid search, scaling, cost shape, and which fits your RAG stack.

3m read· AgentsCamp

Guide

Agentic RAG: When Retrieval Needs an Agent in the Loop

What agentic RAG is — retrieval as a tool an agent uses iteratively, with query planning, self-correction, and multi-source routing — and when the upgrade pays.

3m read· AgentsCamp

Guide

Choosing Embeddings in 2026: OpenAI vs Cohere vs Voyage vs Open-Source

A decision guide for picking an embedding model for retrieval — accuracy, dimensions, cost, multilingual and domain fit, self-hosting, and lock-in.

4m read· AgentsCamp

Guide

GraphRAG Explained: When Knowledge Graphs Beat Vector Search

What GraphRAG is, how graph-based retrieval differs from vector RAG, the query shapes where it wins, and the honest costs before you build one.

3m read· AgentsCamp

Guide

How Embeddings Work: Vectors, Similarity, and Choosing a Model

What an embedding actually is, how similarity is measured, how the models are trained, and the practical rules for using embeddings well in search and RAG.

6m read· AgentsCamp

Guide

How RAG Actually Works: Ingestion, Chunking, Retrieval & Reranking

A clear, practical walkthrough of the retrieval-augmented generation pipeline — what each stage does, where it fails, and how the pieces fit together.

4m read· AgentsCamp

Guide

Hybrid Search & Reranking: From Top-50 Recall to Top-5 Precision

How production RAG combines dense and sparse search, fuses with RRF, and reranks — turning a wide candidate set into the few passages that actually answer.

3m read· AgentsCamp

Guide

RAG vs Long Context: Do Million-Token Windows Kill Retrieval?

Million-token context windows promised the end of RAG. The honest 2026 answer: long context changed where retrieval starts paying, not whether it does.

2m read· AgentsCamp

Guide

Getting Web Data into AI Agents: Search & Scraping APIs Compared

The agent web-data layer — Exa for semantic search, Firecrawl for extraction at scale, Tavily for all-in-one, Jina Reader for zero-setup — and how they compose.

2m read· AgentsCamp

Guide

Best Vector Database in 2026: pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus vs Chroma vs LanceDB

A decision guide to vector databases — embedded, server, or managed; whether you already run Postgres; and which fits your scale, filtering, and RAG needs.

5m read· AgentsCamp

Guide

Vector Search at Scale: ANN Indexes, Quantization & Sharding

How to run vector search over millions to billions of vectors without blowing latency, memory, or cost — index families, quantization, filtering, and sharding.

6m read· AgentsCamp

Guide

Why RAG Fails: A Debugging Checklist

A diagnostic checklist for broken RAG — localize the failure to ingestion, retrieval, ranking, or generation, and apply the fix that matches, in order.

3m read· AgentsCamp

Guide

Multimodal Embeddings and Image Search

How multimodal embeddings put images and text in one vector space, and how to build text-to-image and image-to-image search on top of it.

6m read· AgentsCamp

Guide

Multimodal RAG over PDFs, Scans & Charts: Two Approaches That Actually Work

RAG over visual documents — PDFs, scans, charts — where text-only extraction loses tables and layout. Parse-then-text vs embed-the-page-image, with trade-offs.

6m read· AgentsCamp

Tool

Chonkie

A lightweight, fast chunking library for RAG with many splitting strategies in one API.

open sourcesdk

Tool

Chroma

An open-source, Python-first vector database that runs in-process — the fastest path from pip install to a working retrieval prototype.

open sourcesdk

Tool

Cohere Rerank

A hosted reranking API that reorders retrieved passages by true relevance to a query.

freemiumplatform

Tool

Docling

Open-source Python library that parses PDFs, DOCX, PPTX, HTML, and images into structured Markdown and JSON with layout, tables, and reading order for RAG.

open sourcesdk

Tool

LanceDB

An open-source embedded vector database built on the Lance columnar format — serverless, multimodal, and designed to scale on local disk or object storage.

open sourcesdk

Tool

Llamaindex

The data framework for LLM apps — ingestion, indexing, query engines, and document agents — now centered on document processing with LlamaParse and LlamaCloud.

open sourcesdk

Tool

LlamaParse

Hosted document-parsing API from LlamaIndex that turns complex PDFs — tables, charts, figures, handwriting — into clean, LLM-ready Markdown for RAG.

freemiumplatform

Tool

Marker

Open-source pipeline that converts PDFs, images, and Office docs into clean Markdown, JSON, or HTML fast, with optional LLM assist for tables and equations.

open sourcesdk

Tool

Milvus

An open-source vector database built for billion-scale similarity search, with a distributed architecture and a wide menu of index types.

open sourceplatform

Tool

pgvector

An open-source Postgres extension that adds a vector type and HNSW/IVFFlat indexes for similarity search inside your existing database.

open sourcesdk

Tool

Pinecone

A fully managed, serverless vector database for similarity search and RAG — no nodes to run, indexes to tune, or infrastructure to operate.

freemiumplatform

Tool

Qdrant

An open-source vector database written in Rust, built for low-latency similarity search at scale.

open sourceplatform

Tool

RAGAS

An open-source framework for evaluating retrieval-augmented generation with reference-free RAG metrics.

open sourceevaluation

Tool

Reducto

High-accuracy document ingestion API — parsing, agentic OCR, table and figure extraction, and splitting that turns messy PDFs into LLM-ready data for RAG.

freemiumplatform

Tool

turbopuffer

A serverless vector and full-text search database built on object storage (S3/GCS/Azure) — usage-based pricing, hybrid search, and low cost per GB at scale.

paidplatform

Tool

Unstructured

Open-source library plus hosted Platform/API that turns messy documents — PDF, HTML, docx, images, email — into clean, chunked JSON for LLMs and RAG.

freemiumplatform

Tool

Voyage AI

Embedding and reranking models tuned for retrieval, now part of MongoDB.

freemiumplatform

Tool

Weaviate

An open-source vector database with built-in hybrid search, pluggable vectorizer modules, and GraphQL/REST/gRPC APIs.

open sourceplatform

Command

Scaffold a pgvector Schema & HNSW Index

Scaffold a production-ready pgvector schema and HNSW index for a corpus — matching the project's migration tooling, distance metric, and embedding dimensions.

/scaffold-pgvector-schema<table/corpus name and embedding dimensions, or a description of the data>

Command

Benchmark Rerankers

Measure whether adding a reranker actually improves retrieval, by scoring reranked vs. un-reranked results on a labeled query set.

/benchmark-rerankers<path to eval set / retrieval results, or a description of the pipeline>

Command

Scaffold RAG Pipeline

Scaffold a Retrieval-Augmented Generation pipeline — ingestion (load, chunk, embed, upsert) and retrieval (search, rerank, grounded prompt with citations) — fitted to the project's stack.

/scaffold-rag-pipeline<data source and use case>

Term