Qdrant
An open-source vector database written in Rust, built for low-latency similarity search at scale.
Qdrant is an open-source, Rust-based vector database for storing embeddings and running fast similarity search with rich payload filtering, hybrid (dense + sparse) search, and on-disk quantization — the retrieval store behind many production RAG systems.
Qdrant is an open-source vector database for storing embeddings and retrieving the nearest matches to a query vector. Written in Rust, it is built for low-latency search over large collections, and it pairs vector similarity with structured payload filtering so you can constrain results by metadata (tenant, date, document type) without sacrificing recall.
It is aimed at teams building retrieval-augmented generation (RAG), semantic search, recommendations, and deduplication who want a store they can self-host or run as a managed service. You can start with a single Docker container and scale to a distributed, sharded cluster as your data grows.
Highlights
- Hybrid search — combine dense vectors with sparse (keyword/BM25-style) vectors and fuse the results, the pattern most production RAG systems converge on.
- Payload filtering — attach JSON metadata to each point and filter on it during search, with indexes that keep filtered queries fast.
- Quantization — scalar, product, and binary quantization shrink the memory footprint and speed up search, with optional on-disk storage for very large collections.
- Distributed & resilient — sharding and replication for horizontal scale and high availability.
- Clients everywhere — official SDKs for Python, TypeScript/JavaScript, Rust, Go, and Java, plus a REST and gRPC API.
In an AI-assisted workflow
A typical RAG loop: embed your chunks (see Choosing Embeddings in 2026), upsert them as points with metadata, then query with the embedded question and an optional filter.
from qdrant_client import QdrantClient, models
client = QdrantClient(url="http://localhost:6333")
client.query_points(
collection_name="docs",
query=embed("How do I rotate API keys?"),
query_filter=models.Filter(must=[
models.FieldCondition(key="product", match=models.MatchValue(value="billing"))
]),
limit=20, # over-retrieve, then rerank down to top-5
)TIP
Over-retrieve from Qdrant (top-20–50) and rerank with a cross-encoder like Cohere Rerank before sending the top 5 to the model — see Hybrid Search & Reranking.
Good to know
Qdrant is free and open source under Apache-2.0 and can be self-hosted with Docker or Kubernetes. Qdrant Cloud offers a managed option with a free tier for getting started. Because it is infrastructure rather than a desktop app, plan for the operational basics — backups, monitoring, and capacity for your index size.
Related
- Hybrid Search & Reranking: From Top-50 Recall to Top-5 PrecisionHow production RAG combines dense and sparse search, fuses with RRF, and reranks — turning a wide candidate set into the few passages that actually answer.
- Rag Pipeline EngineerUse this agent to design, build, and harden a production retrieval-augmented generation (RAG) pipeline end to end — ingestion, chunking, embeddings, indexing, retrieval, reranking, and grounded generation — with evals that prove each stage works. Examples — "stand up RAG over our docs", "our RAG hallucinates and misses obvious answers, fix the pipeline", "take our prototype RAG to production with evals and citations".
- Cohere RerankA hosted reranking API that reorders retrieved passages by true relevance to a query.
- Voyage AIEmbedding and reranking models tuned for retrieval, now part of MongoDB.
- ChonkieA lightweight, fast chunking library for RAG with many splitting strategies in one API.
- Retrieval EngineerUse this agent to raise the retrieval quality of a search or RAG system — recall and precision, hybrid (dense + sparse) search, reranking, query transformation, and metadata filtering — measured against a labeled eval set. Examples — "our RAG retrieves irrelevant chunks, fix recall", "add hybrid search and reranking and prove it helps", "queries with acronyms/IDs return nothing, fix it".
- Vector Search EngineerUse this agent to design, build, and tune the vector-database layer of a search or RAG system — schema and index design (HNSW/IVF + quantization), metadata/payload filtering, hybrid (dense + sparse) search, and ingestion/upsert pipelines — sized to a real latency, recall, and cost budget. Examples — "set up pgvector for our docs with HNSW and filtered search", "our Qdrant queries are slow and recall dropped after quantization", "add metadata filtering so search only returns the current tenant's documents".
- Embedding Index TunerTune a vector index — HNSW graph parameters and quantization — to hit a recall target at the lowest latency and memory, by sweeping settings against a fixed query set instead of trusting defaults. Use when vector search is slow or memory-hungry, when recall dropped after enabling quantization, or when standing up an index and you need defensible parameters.
- Best Vector Database in 2026: pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus vs Chroma vs LanceDBA decision guide to vector databases — embedded, server, or managed; whether you already run Postgres; and which fits your scale, filtering, and RAG needs.
- MilvusAn open-source vector database built for billion-scale similarity search, with a distributed architecture and a wide menu of index types.
- pgvectorAn open-source Postgres extension that adds a vector type and HNSW/IVFFlat indexes for similarity search inside your existing database.
- PineconeA fully managed, serverless vector database for similarity search and RAG — no nodes to run, indexes to tune, or infrastructure to operate.
- WeaviateAn open-source vector database with built-in hybrid search, pluggable vectorizer modules, and GraphQL/REST/gRPC APIs.