RAG & Retrieval — AI Agents, Skills & Tools
Agents, skills, guides, tools, and commands for rag & retrieval — 23 curated resources for building with AI coding agents.
Rag Pipeline Engineer
Use this agent to design, build, and harden a production retrieval-augmented generation (RAG) pipeline end to end — ingestion, chunking, embeddings, indexing, retrieval, reranking, and grounded generation — with evals that prove each stage works. Examples — "stand up RAG over our docs", "our RAG hallucinates and misses obvious answers, fix the pipeline", "take our prototype RAG to production with evals and citations".
Retrieval Engineer
Use this agent to raise the retrieval quality of a search or RAG system — recall and precision, hybrid (dense + sparse) search, reranking, query transformation, and metadata filtering — measured against a labeled eval set. Examples — "our RAG retrieves irrelevant chunks, fix recall", "add hybrid search and reranking and prove it helps", "queries with acronyms/IDs return nothing, fix it".
Vector Search Engineer
Use this agent to design, build, and tune the vector-database layer of a search or RAG system — schema and index design (HNSW/IVF + quantization), metadata/payload filtering, hybrid (dense + sparse) search, and ingestion/upsert pipelines — sized to a real latency, recall, and cost budget. Examples — "set up pgvector for our docs with HNSW and filtered search", "our Qdrant queries are slow and recall dropped after quantization", "add metadata filtering so search only returns the current tenant's documents".
Chunking Strategy Optimizer
Find the chunking strategy and size that maximizes retrieval quality for a specific corpus, by sweeping configurations against a fixed eval set instead of guessing. Use when RAG answers miss obvious content, when standing up a new corpus, or when picking chunk size/overlap.
Embedding Set Inspector
Diagnose the health of an embedding set before blaming the retriever — checking normalization, dimensionality, near-duplicates, degenerate vectors, and corpus/query distribution mismatch. Use when retrieval quality is poor, after a re-embed, or before shipping a new index.
Embedding Index Tuner
Tune a vector index — HNSW graph parameters and quantization — to hit a recall target at the lowest latency and memory, by sweeping settings against a fixed query set instead of trusting defaults. Use when vector search is slow or memory-hungry, when recall dropped after enabling quantization, or when standing up an index and you need defensible parameters.
Choosing Embeddings in 2026: OpenAI vs Cohere vs Voyage vs Open-Source
A decision guide for picking an embedding model for retrieval — accuracy, dimensions, cost, multilingual and domain fit, self-hosting, and lock-in.
How RAG Actually Works: Ingestion, Chunking, Retrieval & Reranking
A clear, practical walkthrough of the retrieval-augmented generation pipeline — what each stage does, where it fails, and how the pieces fit together.
Hybrid Search & Reranking: From Top-50 Recall to Top-5 Precision
How production RAG combines dense and sparse search, fuses with RRF, and reranks — turning a wide candidate set into the few passages that actually answer.
Best Vector Database in 2026: pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus vs Chroma vs LanceDB
A decision guide to vector databases — embedded, server, or managed; whether you already run Postgres; and which fits your scale, filtering, and RAG needs.
Chonkie
A lightweight, fast chunking library for RAG with many splitting strategies in one API.
Chroma
An open-source, Python-first vector database that runs in-process — the fastest path from pip install to a working retrieval prototype.
Cohere Rerank
A hosted reranking API that reorders retrieved passages by true relevance to a query.
LanceDB
An open-source embedded vector database built on the Lance columnar format — serverless, multimodal, and designed to scale on local disk or object storage.
Milvus
An open-source vector database built for billion-scale similarity search, with a distributed architecture and a wide menu of index types.
pgvector
An open-source Postgres extension that adds a vector type and HNSW/IVFFlat indexes for similarity search inside your existing database.
Pinecone
A fully managed, serverless vector database for similarity search and RAG — no nodes to run, indexes to tune, or infrastructure to operate.
Qdrant
An open-source vector database written in Rust, built for low-latency similarity search at scale.
RAGAS
An open-source framework for evaluating retrieval-augmented generation with reference-free RAG metrics.
Voyage AI
Embedding and reranking models tuned for retrieval, now part of MongoDB.
Weaviate
An open-source vector database with built-in hybrid search, pluggable vectorizer modules, and GraphQL/REST/gRPC APIs.
Scaffold a pgvector Schema & HNSW Index
Scaffold a production-ready pgvector schema and HNSW index for a corpus — matching the project's migration tooling, distance metric, and embedding dimensions.
Benchmark Rerankers
Measure whether adding a reranker actually improves retrieval, by scoring reranked vs. un-reranked results on a labeled query set.