Embedding Set Inspector
Diagnose the health of an embedding set before blaming the retriever — checking normalization, dimensionality, near-duplicates, degenerate vectors, and corpus/query distribution mismatch. Use when retrieval quality is poor, after a re-embed, or before shipping a new index.
Install to ~/.claude/skills/embedding-set-inspector/SKILL.md
When RAG retrieval is bad, the embeddings are often the culprit, not the search. This skill inspects an embedding set for the usual failure modes — unnormalized or wrong-dimension vectors, near-duplicates, degenerate/empty embeddings, and query/document distribution mismatch — and reports what to fix.
When retrieval is poor, teams reach for a bigger model or a reranker before checking whether the embeddings themselves are sound. This skill inspects an embedding set for the failure modes that quietly wreck recall, so you fix the cause instead of layering patches on top.
When to use this skill
- Retrieval recall is low and you want to rule out the embeddings before tuning the retriever.
- After re-embedding a corpus (new model, new chunking) and before promoting the index.
- A subset of documents is "invisible" to search no matter the query.
- Validating a freshly built index in CI before it ships.
Instructions
- Confirm the basics. Verify every vector has the expected dimensionality and that vectors are normalized if your distance metric assumes it (cosine vs. dot product vs. L2 mismatch is a classic silent bug). Flag any zero, NaN, or near-zero-norm vectors — usually empty or failed-to-embed chunks.
- Check for asymmetry handling. If the model supports input types (document vs. query), confirm documents were embedded as documents and queries as queries. Mixing them degrades retrieval and is easy to get wrong.
- Profile the distribution. Summarize pairwise similarity: if almost everything is highly similar to everything else, the embeddings are not discriminating (often over-large chunks or a domain mismatch). If clusters are extreme, check for duplicated or boilerplate content dominating the space.
- Find near-duplicates. Detect chunks whose embeddings are near-identical — repeated headers/footers, navigation, or licence text — which crowd out real answers in the top-k. Recommend dedup or metadata filtering.
- Test query/document alignment. Embed a handful of the eval queries and confirm their nearest neighbours are plausible. A systematic mismatch (queries land far from all documents) points to a model or input-type problem, not a tuning problem.
- Report and recommend. Summarize findings as
severity | issue | affected count | fix, ordered by impact on retrieval.
NOTE
Embeddings from different models are not comparable. Never mix vectors from two models in one index, and re-embed the whole corpus when you switch — see Choosing Embeddings in 2026.
WARNING
A normalization or distance-metric mismatch can make retrieval look "sort of working" while quietly tanking recall. Check it first — it is the single most common embedding bug.
Output
A health report: dimensionality/normalization status, count of degenerate vectors, near-duplicate clusters, distribution summary, query-alignment spot checks, and a prioritized list of fixes.
Related
- Choosing Embeddings in 2026: OpenAI vs Cohere vs Voyage vs Open-SourceA decision guide for picking an embedding model for retrieval — accuracy, dimensions, cost, multilingual and domain fit, self-hosting, and lock-in.
- Voyage AIEmbedding and reranking models tuned for retrieval, now part of MongoDB.
- Chunking Strategy OptimizerFind the chunking strategy and size that maximizes retrieval quality for a specific corpus, by sweeping configurations against a fixed eval set instead of guessing. Use when RAG answers miss obvious content, when standing up a new corpus, or when picking chunk size/overlap.
- Retrieval EngineerUse this agent to raise the retrieval quality of a search or RAG system — recall and precision, hybrid (dense + sparse) search, reranking, query transformation, and metadata filtering — measured against a labeled eval set. Examples — "our RAG retrieves irrelevant chunks, fix recall", "add hybrid search and reranking and prove it helps", "queries with acronyms/IDs return nothing, fix it".
- Embedding Index TunerTune a vector index — HNSW graph parameters and quantization — to hit a recall target at the lowest latency and memory, by sweeping settings against a fixed query set instead of trusting defaults. Use when vector search is slow or memory-hungry, when recall dropped after enabling quantization, or when standing up an index and you need defensible parameters.