Vector Databases Compared: Pinecone vs Weaviate vs Qdrant vs pgvector vs FAISS

Every RAG tutorial starts with "install a vector database." Few tutorials explain what a vector database actually is, why the indexing algorithm you choose matters more than the database brand, what "filtered vector search" actually costs, or when you should be using pgvector on your existing Postgres instead of adding yet another database to your infrastructure. Let's fix that.

This article is the foundation you'll want before reading the RAG on GCP article — understand the vector storage layer first, then the managed RAG services that sit on top of it. We'll cover the indexing algorithms (HNSW vs IVFFlat vs ScaNN vs DiskANN), the filtering problem that breaks most vector DB benchmarks, and an honest comparison of the main players with real cost numbers.

What a Vector Database Actually Does

A vector database solves one problem: given a query vector q and a collection of N stored vectors, find the K stored vectors most similar to q — fast. This is approximate nearest neighbor (ANN) search, and it's a fundamentally different problem from B-tree or hash index lookups in relational databases.

The "approximate" part is deliberate. Exact nearest neighbor search requires computing distance to every vector in the collection — O(N) time. With 10 million vectors, that's 10 million distance computations per query. ANN algorithms sacrifice a small amount of recall accuracy (missing 1-5% of true nearest neighbors) in exchange for O(log N) or better query time. For most RAG applications, 95% recall is fine.

Similarity Metrics

The distance metric used affects both retrieval quality and index performance:

Cosine similarity: Measures the angle between vectors, ignoring magnitude. Standard for text embeddings. Most embedding models (text-embedding-3, sentence-transformers) produce unit-normalized vectors, in which case cosine = dot product.
Dot product: Faster than cosine (no normalization step). Works correctly when vectors are already unit-normalized — which most modern embedding models guarantee.
L2 (Euclidean): Distance in Euclidean space. Used for image embeddings and dense numerical features. Inappropriate for text embeddings without normalization.
IP (Inner Product): Same as dot product. Pinecone uses this natively for its most efficient configuration.

Indexing Algorithms: HNSW vs IVFFlat vs ScaNN vs DiskANN

HNSW (Hierarchical Navigable Small World)

The current gold standard for in-memory vector search. HNSW builds a multi-layer graph where each vector is connected to its approximate nearest neighbors. Query traversal starts at the top layer (sparse, long-range connections), descends through layers finding progressively closer neighbors, and terminates at the bottom layer with the final ANN candidates.

HNSW's two key parameters: M (number of connections per node, default 16) and ef_construction (candidate list size during index build, default 100). Higher M and ef_construction = higher recall, higher memory and build time. At M=16, an HNSW index uses approximately 100-150 bytes per vector dimension — for 1 million 768-dimensional vectors, expect 75-115 GB of RAM just for the index.

HNSW is used by: Qdrant, Weaviate, pgvector (HNSW index), Milvus, ChromaDB.

IVFFlat (Inverted File with Flat Quantization)

Clusters vectors using k-means into N clusters (typically N = sqrt(total_vectors)). At query time, searches only the top-K closest clusters rather than all clusters. Faster index build than HNSW, lower memory, but requires careful tuning of N (number of clusters) and nprobe (how many clusters to search). Under-tuned IVFFlat produces significantly worse recall than HNSW.

IVFFlat is used by: FAISS (the reference implementation), pgvector (as an alternative to HNSW), Milvus.

ScaNN (Scalable Approximate Nearest Neighbors)

Google Research's algorithm, optimized for high-throughput batch queries. ScaNN uses two-phase search: a fast space partition (similar to IVFFlat) followed by asymmetric distance estimation that reuses precomputed quantized representations. ScaNN consistently tops ANN benchmarks for throughput (queries per second) at equivalent recall, outperforming HNSW by 2-10x in throughput while using less memory. The trade-off: ScaNN is harder to tune and less mature tooling compared to HNSW.

ScaNN is used by: Google's Vertex AI Vector Search, AlloyDB AI (as covered in the RAG on GCP article).

DiskANN (Disk-Based ANN)

Microsoft Research's algorithm for datasets too large for RAM. DiskANN indexes vectors on SSD rather than RAM, using graph-based ANN navigation with intelligent prefetching. Enables billion-scale vector search on machines with standard RAM (32-64 GB) that would need terabytes of RAM for HNSW. The latency is higher than in-memory HNSW (milliseconds vs microseconds), but for applications tolerating 10-50ms query latency, DiskANN makes billion-vector search economically feasible.

DiskANN is used by: Azure AI Search (internally), Weaviate (optional disk-based mode).

graph TD
    subgraph HNSW["HNSW Graph (3 layers)"]
        L2A["Layer 2\n(sparse, long range)"]
        L1A["Layer 1\n(medium density)"]
        L0A["Layer 0\n(dense, all vectors)"]
        L2A -->|"navigate to approx region"| L1A
        L1A -->|"refine neighbors"| L0A
    end

    subgraph IVF["IVFFlat (k-means clusters)"]
        C1["Cluster 1\n~1k vectors"]
        C2["Cluster 2\n~1k vectors"]
        C3["Cluster 3\n~1k vectors"]
        CN["...N clusters"]
        CQ["Query→\nfind closest\n2-4 clusters"]
        CQ --> C1
        CQ -.-> C2
    end

    subgraph SCANN["ScaNN (two-phase)"]
        SP1["Phase 1: Space partition\n(coarse quantization)"]
        SP2["Phase 2: Asymmetric\ndistance estimation\n(refined scoring)"]
        SP1 --> SP2
    end

The three dominant ANN algorithms. HNSW traverses a multi-layer graph from coarse to fine. IVFFlat uses k-means clustering to narrow the search space. ScaNN combines coarse partitioning with efficient asymmetric distance computation for maximum throughput. All three sacrifice small amounts of recall for order-of-magnitude query speedups over brute force.

The Filtering Problem

Every real-world vector search includes metadata filters: "find the 5 most similar documents, but only from the Finance department, only created after 2024-01-01." This is where ANN benchmarks diverge wildly from production performance — and where vector database design choices become critical.

Three filtering strategies exist, with different performance profiles:

Post-filtering (naive, broken)

Search the full vector index for top-K, then discard results that don't match the filter. If 90% of your corpus is Finance documents and you want only Engineering documents (10%), post-filtering with k=10 returns 10 vectors from ANN search, ~9 of which are Finance (filtered out), leaving ~1 Engineering result. You asked for 5; you get 1. Do not use post-filtering for selective metadata filters.

Pre-filtering (correct, expensive)

First apply the metadata filter to get the candidate set, then run ANN search over only the filtered candidates. Correct results, but slow: if the filtered subset is small (100 vectors), you're doing exact nearest neighbor over 100 vectors (fast). If the filtered subset is large (5M vectors), you're doing ANN over 5M vectors (requires rebuilding a sub-index on the fly or degrading to brute force). Qdrant, Weaviate, and Pinecone all support pre-filtering; efficiency depends on filter selectivity.

In-flight filtering (hybrid, best)

Integrates the metadata filter into the ANN graph traversal itself — during HNSW navigation, prune nodes that don't match the filter. This avoids the pre-filtering overhead for large filtered subsets and avoids the correctness problems of post-filtering. Qdrant implements this with its "filterable HNSW" — a graph that maintains separate entry points per filtered payload value. It's the best approach but requires the vector database to have been built with this capability.

Why most benchmarks don't show you filtering performance: ANN-Benchmarks (the standard reference benchmark) measures unfiltered recall@10 and QPS. Production RAG queries almost always include metadata filters. When you're choosing a vector database for a production use case, test with realistic filters — especially high-selectivity filters that return a small fraction of your corpus. The QPS rankings change dramatically.

Platform-by-Platform Breakdown

Pinecone

The managed-only vector database. No self-hosting, no open-source version. Serverless architecture (pay per query and storage unit, no idle cost) or dedicated pods. Pinecone's index type is proprietary — not HNSW, not IVFFlat — built around approximate nearest neighbor search with metadata filtering. Its managed simplicity is genuine: create an index, upsert vectors, query. No configuration decisions beyond dimension count and metric.

The downside is cost at scale. Pinecone Serverless bills per write unit (WU) and read unit (RU): at 768 dimensions, 1M upserts costs ~$2 in WUs, and 100k queries at k=10 costs ~$1. Moderate volumes are fine; high-throughput production services at millions of queries/day can get expensive. The other downside: vendor lock-in. Pinecone's index format is not portable — migrating off Pinecone means re-indexing your entire corpus into a different system.

Weaviate

Open-source (BSD-3), self-hostable, with Weaviate Cloud as the managed option. Schema-based: you define classes with properties and their data types, and Weaviate enforces them. This schema approach feels more like a database and less like a key-value store. Weaviate's modules system enables in-database embedding generation — you can configure text2vec-openai or text2vec-cohere as a module and have Weaviate call the embedding API automatically during upsert and query.

Weaviate's multi-tenancy model is strong for SaaS applications: each tenant gets isolated namespaces with separate vectors and configurable resource limits. BM25+vector hybrid search is built in. The self-hosted option requires orchestrating Weaviate's components (standalone or Kubernetes Helm chart), but it works well and the community is active.

Qdrant

Open-source (Apache 2.0), written in Rust. Qdrant's focus on production performance shows: it has the fastest insertion throughput of any HNSW-based system I've tested, and its filterable HNSW handles selective metadata filters better than competitors. Qdrant supports named vectors (multiple embeddings per point — e.g., title embedding + body embedding in the same record), sparse vectors (for hybrid sparse+dense search), and quantization (scalar or product quantization to reduce memory by 4-16x with modest recall trade-off).

Qdrant's gRPC API is significantly faster than REST for high-throughput applications. The Qdrant Cloud managed offering is reasonably priced and genuinely simple to operate. For new self-hosted deployments, Qdrant is my default recommendation in 2025 — it combines performance, production readiness, and pricing that's hard to beat.

from qdrant_client import QdrantClient, models

client = QdrantClient(url="http://localhost:6333")

# Create collection with named vectors for multi-embedding use
client.create_collection(
    collection_name="documents",
    vectors_config={
        "title": models.VectorParams(size=768, distance=models.Distance.COSINE),
        "body":  models.VectorParams(size=768, distance=models.Distance.COSINE),
    }
)

# Upsert with payload (filterable metadata)
client.upsert(
    collection_name="documents",
    points=[
        models.PointStruct(
            id=1,
            vector={"title": title_embedding, "body": body_embedding},
            payload={"department": "Finance", "created_at": "2024-06-01", "doc_type": "policy"}
        )
    ]
)

# Filtered search using in-flight filtering (Qdrant filterable HNSW)
results = client.query_points(
    collection_name="documents",
    using="body",
    query=query_embedding,
    query_filter=models.Filter(
        must=[
            models.FieldCondition(key="department", match=models.MatchValue(value="Finance")),
            models.FieldCondition(key="created_at", range=models.DatetimeRange(gte="2024-01-01"))
        ]
    ),
    limit=5
)

pgvector

PostgreSQL extension that adds vector data type and HNSW/IVFFlat indexes. If you're already running Postgres, pgvector is the lowest-friction path to vector search — no new service, no new team skill set, no new operational runbook. The limitations are real: pgvector's HNSW implementation is slower than Qdrant's at equivalent recall, and Postgres's shared-memory architecture limits concurrency for high-throughput vector queries. But for applications doing under ~1,000 vector queries/second on corpora under 5M vectors, pgvector on a well-tuned RDS/Cloud SQL instance is completely adequate.

-- pgvector: create table with vector column
CREATE TABLE document_chunks (
    id          BIGSERIAL PRIMARY KEY,
    doc_id      TEXT NOT NULL,
    chunk_text  TEXT NOT NULL,
    embedding   VECTOR(768),          -- 768-dimensional embedding
    department  TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index (much faster than IVFFlat for most queries)
CREATE INDEX ON document_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Hybrid filtered search
SELECT doc_id, chunk_text,
       1 - (embedding <=> $1::vector) AS similarity
FROM document_chunks
WHERE department = 'Finance'
  AND created_at >= '2024-01-01'
ORDER BY embedding <=> $1::vector
LIMIT 5;

FAISS

Facebook AI Similarity Search is a library, not a database server. You import it in Python, build an index in memory, and query it directly. No server, no persistence (unless you serialize the index to disk manually), no metadata storage (that's your job). FAISS is the right choice for offline batch jobs (embedding a corpus once and querying it thousands of times in a pipeline), research experiments, and as the backend for higher-level frameworks (LangChain, LlamaIndex use FAISS as a local vector store option). It is not the right choice for a persistent production vector search service.

Decision Guide: Which One to Choose

Need	Best choice	Why
Simplest managed path, willing to pay premium	Pinecone Serverless	No ops, instant setup, pay-per-use
Self-hosted, best performance + filtering	Qdrant	Filterable HNSW, Rust speed, named vectors, Apache 2.0
Already on PostgreSQL, <5M vectors	pgvector	Zero new infra, HNSW index, SQL joins with metadata
Multi-tenant SaaS, need schema + modules	Weaviate	Strong multi-tenancy, built-in embedding modules
Batch/offline processing, research	FAISS	Python library, no server overhead, fastest for batch
Billion-scale, RAM-constrained	DiskANN / Azure AI Search	Disk-based index, 10-50ms latency, terabyte-scale
Already in GCP, want managed	Vertex AI Vector Search or AlloyDB AI	ScaNN performance, GCP-native, no separate infra
Already in Snowflake, want managed	Cortex Search	Zero new infra, hybrid search, Snowflake governance

The pattern I see most in production: teams start with pgvector (zero friction, postgres already running), outgrow it at ~2-5M vectors or ~500 QPS, then migrate to Qdrant for self-hosted or Pinecone/Weaviate Cloud for managed. Very few teams need the billion-scale path; most production RAG systems are under 10M documents and well within HNSW-based solutions' sweet spot.

The benchmark trap: ANN-Benchmarks shows Qdrant/Weaviate/pgvector all achieving 95%+ recall@10 at thousands of QPS. These benchmarks use unfiltered search on clean, uniformly distributed embeddings. Your production workload has metadata filters, skewed query distributions, noisy embeddings from chunked documents, and real-world concurrency. Always benchmark with a realistic sample of your actual data and query patterns before committing to a platform — especially if you expect high filter selectivity (returning under 10% of your corpus).

One final note: the vector database market is consolidating. In 2023, the "best" answer was "use a dedicated vector database." By 2025, every major data platform (Snowflake Cortex Search, AlloyDB AI, Azure AI Search, Elasticsearch, Redis, even MongoDB) has a vector search capability. For most RAG use cases, the question is less "which vector database should I use?" and more "does my existing data platform's vector search capability meet my requirements?" It often does.