Skip to content
agentscamp
Guide · Database

Best Vector Database in 2026: pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus vs Chroma vs LanceDB

A decision guide to vector databases — embedded, server, or managed; whether you already run Postgres; and which fits your scale, filtering, and RAG needs.

5 min readAgentsCamp
Updated Jun 4, 2026
vector-databaseragembeddingscomparisontools

There's no single best vector database — choose by where you run (embedded, self-hosted server, or fully managed), whether you already run Postgres, your scale and filtering needs, and cost. pgvector wins when you already have Postgres; Pinecone for zero-ops managed; Qdrant/Weaviate/Milvus for open-source servers at scale; Chroma/LanceDB for embedded prototyping. Validate recall on your own data.

Key takeaways

  • Pick along three axes: deployment (embedded vs. server vs. managed), scale (millions vs. billions of vectors), and whether your data already lives in Postgres.
  • Already on Postgres? Start with pgvector — vectors sit next to your relational data with full SQL and transactions, no new system to operate.
  • Want zero ops? Pinecone is fully managed and serverless. Want open-source servers you control? Qdrant, Weaviate, and Milvus scale from one container to a cluster.
  • Prototyping or embedding in-process? Chroma (Python-first DX) and LanceDB (on-disk, multimodal, object-storage scale) need no server.
  • Almost all of them now do hybrid (dense + sparse) search and metadata filtering — the real differentiators are ops model, scale ceiling, and cost, not the feature checklist.

Once you've chosen an embedding model and chunked your corpus, the vectors have to live somewhere that can find the nearest matches to a query — fast, with filtering, at your scale. That somewhere is a vector database. The market is crowded, but the choice is not actually about who has the longest feature list. By 2026 they all do approximate nearest-neighbour search, hybrid search, and metadata filtering. The decision is about where you run it, at what scale, and what you already operate.

This guide gives you the axes that matter and an honest read on the main options.

Start with three questions, not a feature matrix

  1. Where does it run? Embedded (in your app process, local files or object storage), a self-hosted server (you operate it), or fully managed (someone else operates it). This single choice eliminates most of the field.
  2. What's your scale? Thousands to a few million vectors is a different problem from hundreds of millions to billions. Most apps live in the first bucket and over-buy for the second.
  3. What do you already run? If your data already lives in Postgres, putting vectors there too removes an entire system from your architecture — one database, one backup, one transaction.

Answer those and the shortlist writes itself.

The contenders

Already on Postgres → pgvector

pgvector is a Postgres extension that adds a vector type and HNSW/IVFFlat indexes. Its superpower isn't raw speed — it's that your embeddings sit next to your relational data, filterable with ordinary SQL WHERE clauses and consistent inside the same transaction. No second system, no sync pipeline, no separate backup. For most apps up to a few million vectors, that operational simplicity beats a dedicated store. When a single node runs out of room, extensions like pgvectorscale push the ceiling higher before you have to leave Postgres at all. If you already run Postgres, start here and only move when you have a measured reason to. (To stand one up, see Scaffold a pgvector Schema & HNSW Index.)

Zero ops, fully managed → Pinecone

Pinecone is a fully managed, serverless vector database. You never run a node, tune an index, or page yourself at 3am — you call an API and pay for what you use. That's the whole pitch, and it's a good one when engineering time is your scarce resource and "it's someone else's job to keep it up" is worth the per-query cost and the lack of a self-host escape hatch. Best when you want retrieval to be a managed dependency, not infrastructure you own.

Open-source servers you control → Qdrant, Weaviate, Milvus

All three are open-source, self-hostable, and offer a managed cloud — the sweet spot when you want control, data residency, or cost-at-scale and an off-ramp to hosted.

  • Qdrant — Rust, lean and fast, with excellent payload filtering, hybrid search, and aggressive quantization (scalar/product/binary, on-disk). Starts as one Docker container and shards into a cluster. A great default open-source server.
  • Weaviate — Go, with a rich module ecosystem, built-in hybrid search, and optional in-database vectorization so it can embed your data for you. Strong when you want batteries included.
  • Milvus — built from the ground up for billion-scale, with a distributed architecture that separates storage and compute and a wide menu of index types. The pick when your scale genuinely justifies the operational weight.

Embedded, no server → Chroma, LanceDB

  • Chroma — Python-first, runs in-process, and gets you from pip install to a working retrieval demo in minutes. The default for prototypes, notebooks, and small apps; it also runs client-server when you outgrow embedded.
  • LanceDB — embedded too, but built on the columnar Lance format and designed to scale on local disk or object storage (S3) without a server. Handles multimodal data well and bridges "laptop prototype" to "large dataset" without changing systems.

A decision shortcut

  • You already run Postgres and have < a few million vectorspgvector.
  • You want zero operational burden, managedPinecone.
  • You want an open-source server with great filtering and quantizationQdrant.
  • You want modules and built-in vectorizationWeaviate.
  • You're genuinely at hundreds of millions to billions of vectorsMilvus.
  • You're prototyping or need embedded/in-processChroma (DX) or LanceDB (scale on disk/object storage).

TIP

The feature lists have converged — hybrid search, filtering, and quantization are table stakes now. Choose on operational model and scale, then confirm recall on your own queries: load a slice of your corpus, run your labeled query set, and measure recall@k. A store that wins a benchmark on someone else's data can still lose on yours.

WARNING

Don't choose for a scale you don't have. A billion-scale distributed system is real operational weight — sharding, replication, monitoring, capacity planning. Most production RAG runs on a few hundred thousand to a few million vectors, where pgvector or a single Qdrant node is faster to ship and cheaper to run than a cluster you don't need yet.

Beyond the store: memory and tuning

A vector database stores and searches embeddings — it doesn't decide what an agent should remember. If you're building an assistant that needs persistent, per-user memory on top of retrieval, a memory layer like Mem0 sits above your vector store and manages extraction and recall (see Agent Memory Architecture). And whichever store you pick, the index itself has knobs — HNSW graph parameters and quantization trade recall against speed and memory; the Embedding Index Tuner skill tunes them against your latency budget.

For the end-to-end retrieval build, the vector-search-engineer takes a corpus and a query set and returns a measured, filtered, hybrid retrieval setup on the store you chose.

Frequently asked questions

What is the best vector database in 2026?
There's no universal winner — it depends on your deployment model, scale, and existing stack. If you already run Postgres, pgvector is the pragmatic default. For a fully managed, zero-ops store, Pinecone leads. For open-source servers you self-host, Qdrant, Weaviate, and Milvus are the strongest picks (Milvus for billion-scale). For embedded, no-server use, Chroma (great DX) and LanceDB (on-disk, multimodal). Validate recall on your own corpus before committing.
Do I need a dedicated vector database, or is pgvector enough?
For most applications up to a few million vectors, pgvector is enough — and it's the simplest choice if you already run Postgres, because your vectors, metadata, and relational data live in one system with one backup and one transaction boundary. Reach for a dedicated vector database when you outgrow a single Postgres node, need billion-scale search, want advanced quantization and sharding out of the box, or need features like multi-vector and built-in hybrid fusion that pgvector doesn't natively provide.
What's the difference between an embedded and a server vector database?
An embedded vector database (Chroma in-process, LanceDB) runs inside your application process and reads/writes local files or object storage — no separate service to deploy, ideal for prototypes, notebooks, and edge or single-node apps. A server vector database (Qdrant, Weaviate, Milvus, Pinecone) runs as its own service you connect to over the network, which is what you want for shared access, horizontal scale, and high availability.
Which vector databases support hybrid search and metadata filtering?
Nearly all of them in 2026. Qdrant, Weaviate, Milvus, Pinecone, and pgvector (via SQL plus an FTS/BM25 column) all support combining dense vector similarity with sparse/keyword signals and filtering by metadata. The differences are in ergonomics and fusion quality, not whether the feature exists — so choose on deployment model, scale, and cost rather than the hybrid-search checkbox.

Related