GraphRAG: When Your Vector Database Doesn't Know the Whole Story

Classic RAG has a dirty secret: it's great at retrieving locally relevant chunks, but it has no idea how things connect. Ask "what's the relationship between Company X's CEO and its recent regulatory problems?" and a naive vector search will return the most semantically similar passages — probably one document about the CEO and another about the regulatory issues — but it won't understand that these two things are causally linked through a third entity, a board decision made six months ago that's buried in a 200-page filing.

GraphRAG, popularized by Microsoft's 2024 research (and their open-source graphrag library), addresses this by building a knowledge graph from your corpus during ingestion — and using that graph structure both to answer global, thematic questions and to enrich chunk retrieval with explicit entity relationships. It's not magic, but for certain query types it's dramatically better than pure vector search.

What Vanilla RAG Gets Wrong

Standard RAG retrieves the top-k most similar chunks to a query embedding and hands them to an LLM. This works well when:

The answer is localized — it lives in a single passage
The question is factual and specific ("what is the refund policy?")
The corpus is small enough that top-k captures most relevant content

It falls apart when:

The answer requires synthesizing information across many documents
The question is thematic ("what are the main concerns stakeholders have about this project?")
Entity relationships matter ("which suppliers are connected to both Company A and Company B?")
The corpus is large and diverse, so the relevant chunks are spread thin

This isn't a limitation of embeddings per se — it's a limitation of the retrieval paradigm. Embeddings measure semantic similarity; they don't encode structure or causal relationships. A graph does.

Microsoft GraphRAG: The Architecture

Microsoft's GraphRAG pipeline has two distinct phases: an expensive offline indexing phase and a query phase that uses the resulting graph. The indexing is where the heavy lifting happens.

flowchart TD
    subgraph Indexing["Offline Indexing (expensive, run once)"]
        Docs["Raw Documents\n(PDF, text, HTML)"] --> Chunk["Text Chunking\n(~300-600 tokens)"]
        Chunk --> Extract["Entity & Relationship\nExtraction via LLM\n(GPT-4 / Claude)"]
        Extract --> Graph["Knowledge Graph\n(NetworkX / Cosmos DB)"]
        Graph --> Community["Community Detection\n(Leiden Algorithm)"]
        Community --> Summary["Community Summaries\n(LLM-generated per cluster)"]
        Summary --> CommEmbed["Community Summary\nEmbeddings"]
    end

    subgraph Query["Query Phase"]
        Q["User Query"] --> Mode{Query Mode}
        Mode -->|Global| GS["Global Search:\nLLM reads community summaries\nfor thematic questions"]
        Mode -->|Local| LS["Local Search:\nVector search on entities\n+ graph neighborhood traversal"]
        GS --> Answer["LLM Answer\nwith citations"]
        LS --> Answer
    end

    CommEmbed --> LS
    Graph --> LS

GraphRAG separates indexing from querying. Indexing is expensive (calls LLM for every chunk to extract entities). Querying uses global search (community summaries for thematic Q&A) or local search (entity graph + chunks for specific questions).

The Two Query Modes

Global search is designed for thematic, cross-cutting questions: "What are the main themes in this corpus?", "Summarize the key risks across all documents." It works by: (1) finding community summaries relevant to the query, (2) asking the LLM to generate a partial answer from each relevant community, and (3) aggregating partial answers into a final response via map-reduce. This handles questions that require global corpus understanding — something vector search fundamentally cannot do.

Local search is for specific entity-centric questions: "Tell me about Acme Corp's involvement in the merger talks." It works by: (1) finding the query entity in the graph via embedding similarity, (2) traversing the graph neighborhood (entities, relationships, covariate data), (3) combining the graph context with raw text chunks from those documents. It's smarter than pure vector retrieval because it adds the structured relationship context from the graph.

Running Microsoft GraphRAG in Practice

# Install
pip install graphrag

# Initialize project
mkdir my-graphrag-project && cd my-graphrag-project
python -m graphrag init --root .

# Put your documents in ./input/
# Configure settings.yaml with your LLM (OpenAI, Azure OpenAI, or local)

# Run indexing (the expensive part)
python -m graphrag index --root .

# Query
python -m graphrag query \
  --root . \
  --method global \
  --query "What are the main themes in this corpus?"

python -m graphrag query \
  --root . \
  --method local \
  --query "Tell me about the relationship between entity X and entity Y"

The settings.yaml is where you configure the LLM model, chunk size, entity extraction prompts, and community detection parameters. The defaults work but tuning the entity extraction prompt for your domain makes a meaningful difference in graph quality.

Real Indexing Costs

GraphRAG indexing is expensive. Every chunk requires an LLM call for entity extraction, plus additional calls for community summarization. For a 1,000-document corpus: expect ~$20–100 in API costs depending on document length and GPT-4 pricing. For 100,000 documents: you're looking at $2,000–10,000 just for indexing. This is not a one-time search engine build — it's a commitment. Budget accordingly, and re-index only when the corpus changes significantly.

The Leiden Algorithm: Why Community Detection Matters

Community detection is what enables global search. After building the entity relationship graph, GraphRAG runs the Leiden algorithm (a refinement of Louvain) to partition the graph into hierarchical communities. These communities are then summarized with an LLM, producing a two-level index: leaf-level chunks + community-level summaries.

Why Leiden instead of simpler clustering? Leiden optimizes for modularity (how dense connections are within a community vs across communities) while fixing the resolution limit problem in Louvain. For knowledge graphs where entity clusters don't have uniform size, this matters — Leiden will find both small tight clusters (e.g., a specific acquisition deal involving 3 companies) and large thematic communities (e.g., all regulatory issues across the corpus).

GraphRAG vs Hybrid RAG vs Vanilla RAG

Query Type	Vanilla RAG	Hybrid RAG (BM25 + vector)	GraphRAG (local)	GraphRAG (global)
Specific factual question	✅ Good	✅ Good	✅ Good	❌ Overkill
Entity relationships	⚠️ Poor	⚠️ Poor	✅ Excellent	➖ OK
Thematic / "what are all the risks"	❌ Poor	❌ Poor	⚠️ Limited	✅ Excellent
Multi-hop reasoning	❌ Poor	❌ Poor	✅ Good	➖ OK
Indexing cost	💚 Low	💚 Low	🔴 High	🔴 High
Query latency	💚 Fast	💚 Fast	🟡 Medium	🔴 Slow (map-reduce)

When GraphRAG Is Worth the Complexity

Not every RAG application needs a knowledge graph. GraphRAG is worth the cost and complexity when:

Analyst-style Q&A on document corpora: Legal discovery, financial analysis, research synthesis — questions that require connecting dots across hundreds of documents
Thematic summarization at scale: "What do our support tickets say about product X?" across 50,000 tickets is a global search query; GraphRAG handles it; vanilla RAG doesn't
Entity-centric knowledge bases: Corporate knowledge management, scientific literature review, competitive intelligence — where entities (companies, people, concepts) and their relationships ARE the value

It's probably overkill when your corpus is a product FAQ, a single policy document, or any use case where questions are specific and answers are localized. For those, vanilla RAG with good chunking and a reranker is 90% of the value at 5% of the cost.

Hybrid GraphRAG: Best of Both Worlds

The production pattern I've seen work well is a router layer in front of both retrieval modes. A fast LLM call classifies the incoming query as global or local, then routes accordingly. Most user questions are local — specific, entity-centric, factual. A small fraction are global — thematic, corpus-spanning. Routing correctly avoids the latency and cost of global search on questions that don't need it.

from graphrag.query.cli import run_global_search, run_local_search

async def hybrid_graphrag_query(query: str, root_dir: str) -> str:
    # Fast classification using a smaller model
    query_type = await classify_query(query)  # returns "global" or "local"

    if query_type == "global":
        return await run_global_search(
            root_dir=root_dir,
            query=query,
            community_level=2,
            response_type="multiple paragraphs"
        )
    else:
        return await run_local_search(
            root_dir=root_dir,
            query=query,
            community_level=2,
            response_type="single paragraph"
        )

GraphRAG is one of the few genuine architectural advances in RAG from the past two years, not just a prompt engineering tweak. The indexing cost is real and the latency on global queries is painful, but for the right use cases — analyst tools, knowledge management, research synthesis — it makes previously impossible queries tractable. Worth understanding even if you don't use it today.