Embedding
An embedding is a vector of numbers representing text's meaning, placed so similar texts land close together — the foundation of semantic search and RAG.
An embedding is a numeric vector representing a piece of text (or image, or code) in a high-dimensional space arranged by meaning — texts that say similar things get vectors that sit close together.
An embedding model does the mapping: text in, a few hundred to a few thousand floating-point numbers out. Distance in that space approximates semantic similarity, which turns "find documents about X" into geometry: embed the query, find the nearest stored vectors. That single trick underlies semantic search, RAG retrieval, recommendation, clustering, and deduplication.
Two practical truths dominate embedding work. First, the model choice is load-bearing and sticky — quality varies by domain and language, and switching models later means re-embedding everything; the trade-offs across OpenAI, Cohere, Voyage, and open-source options are mapped in Choosing Embeddings in 2026. Second, embeddings are stored and searched in a vector database, whose indexing choices set your speed/recall trade-off. When retrieval misbehaves, diagnose the embedding set before blaming the retriever — that's exactly what the embedding-set-inspector skill does.
Frequently asked questions
- What is an embedding in simple terms?
- A list of numbers (often 256–3,072 of them) that captures what a text means. An embedding model maps 'How do I reset my password?' and 'I forgot my login credentials' to nearby points, even though they share almost no words — which is what lets search work by meaning instead of keywords.
- Do embeddings from different models mix?
- No. Each embedding model defines its own vector space — vectors from one model are meaningless next to vectors from another, and even versions of the same model differ. Switching embedding models means re-embedding the whole corpus, which is why the choice deserves real evaluation up front.
Related
- Choosing Embeddings in 2026: OpenAI vs Cohere vs Voyage vs Open-SourceA decision guide for picking an embedding model for retrieval — accuracy, dimensions, cost, multilingual and domain fit, self-hosting, and lock-in.
- Vector DatabaseA vector database stores embeddings and answers nearest-neighbor queries fast — the retrieval layer under RAG and semantic search, using ANN indexes like HNSW.
- Semantic SearchSemantic search retrieves results by meaning rather than keyword overlap — embedding queries and documents in one vector space and matching by similarity.
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- Embedding Set InspectorDiagnose the health of an embedding set before blaming the retriever — checking normalization, dimensionality, near-duplicates, degenerate vectors, and corpus/query distribution mismatch. Use when retrieval quality is poor, after a re-embed, or before shipping a new index.
- Voyage AIEmbedding and reranking models tuned for retrieval, now part of MongoDB.
- ChunkingChunking splits documents into retrievable pieces before embedding — the RAG design decision that quietly determines retrieval quality.
- Cosine SimilarityCosine similarity measures how alike two embeddings are by the angle between them — the standard relevance score behind semantic search and RAG retrieval.
- Embedding DimensionEmbedding dimension is the length of an embedding vector — how many numbers represent each text — trading capacity against storage and search cost.