Semantic Search
Semantic search retrieves results by meaning rather than keyword overlap — embedding queries and documents in one vector space and matching by similarity.
Semantic search retrieves documents by meaning instead of word overlap: queries and documents are mapped into the same embedding space, and relevance becomes vector similarity.
The mechanism is simple once embeddings exist — embed the corpus offline into a vector database, embed the query at runtime, return the nearest neighbors. The payoff is robustness to phrasing: users don't need to guess the document's vocabulary. The cost is the flip side: semantic search can miss exact tokens — error codes, function names, SKUs — that old-fashioned keyword search nails, and it inherits whatever blind spots the embedding model has in your domain.
That's why mature retrieval is rarely semantic-only. Hybrid search pairs BM25 keyword retrieval with vector search, and a reranker re-sorts the merged candidates — recall from breadth, precision from the reranker. The full pattern, with when each piece earns its place, is in Hybrid Search & Reranking.
Frequently asked questions
- How is semantic search different from keyword search?
- Keyword (lexical) search matches the words themselves — great for exact identifiers, brittle for paraphrases. Semantic search matches meaning via embeddings, so 'laptop won't turn on' finds 'computer fails to boot.' The trade flips for exact strings: error codes and product SKUs are where keyword search still wins.
- Why do production systems combine both?
- Because their failure modes are complementary. Hybrid search runs lexical (BM25) and semantic retrieval together and merges the results, catching both the exact-match cases embeddings fuzz over and the paraphrases keywords miss — usually followed by a reranker to sort the merged pool precisely.
Related
- Hybrid Search & Reranking: From Top-50 Recall to Top-5 PrecisionHow production RAG combines dense and sparse search, fuses with RRF, and reranks — turning a wide candidate set into the few passages that actually answer.
- EmbeddingAn embedding is a vector of numbers representing text's meaning, placed so similar texts land close together — the foundation of semantic search and RAG.
- Vector DatabaseA vector database stores embeddings and answers nearest-neighbor queries fast — the retrieval layer under RAG and semantic search, using ANN indexes like HNSW.
- RerankingReranking is a second-pass scoring step: a cross-encoder model re-orders the top results from fast retrieval so the truly relevant few rise to the top.
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- Cosine SimilarityCosine similarity measures how alike two embeddings are by the angle between them — the standard relevance score behind semantic search and RAG retrieval.
- Hybrid SearchHybrid search runs keyword (BM25) and semantic (vector) retrieval together and merges the results — catching both exact terms and paraphrases.