RAG on AWS: Bedrock Knowledge Bases, GraphRAG, and Amazon Neptune

📚 Part 2 of a 3-part series on RAG

RAG From the Ground Up
RAG on AWS: Bedrock, GraphRAG & Neptune (you are here)
Building a Clinico-Genomics RAG on AWS

Part 1 covered the principles. Now the question every AWS team eventually asks: do you use the managed RAG service and ship in an afternoon, or do you build your own retrieval stack and control every knob? AWS has quietly built one of the more complete RAG toolkits in the cloud — Bedrock Knowledge Bases for managed vector RAG, a native GraphRAG capability backed by Neptune Analytics, structured retrieval that writes SQL for you, and all the primitives (OpenSearch, pgvector on RDS, Neptune) if you'd rather assemble it yourself. This article maps the whole landscape and tells you which path fits which problem.

Bedrock Knowledge Bases: Managed RAG, Batteries Included

Amazon Bedrock Knowledge Bases is the "I don't want to operate a vector database" option. You point it at documents in S3, choose an embedding model and a vector store, and it handles ingestion, chunking, embedding, storage, and retrieval. At query time you call Retrieve (get chunks) or RetrieveAndGenerate (get chunks + generate an answer) and it orchestrates the whole pipeline.

What it manages for you:

Ingestion & chunking: fixed-size, semantic, or hierarchical chunking strategies — configurable, no code
Embeddings: Amazon Titan Text Embeddings V2 (configurable 256/512/1024 dimensions) or Cohere Embed
Vector store: OpenSearch Serverless, Aurora PostgreSQL with pgvector, Neptune Analytics, Pinecone, Redis, and others
Advanced retrieval: hybrid search and reranking are config flags, not code you write

When managed wins: If your retrieval needs are "search my company's documents and answer with citations," Bedrock Knowledge Bases gets you there with hybrid search and reranking enabled by configuration — exactly the Advanced RAG tier from Part 1 — without operating infrastructure. The reasons to build your own start when you need custom retrieval logic, multi-source routing, or retrieval patterns the managed service doesn't expose.

Why Vector RAG Hits a Wall — and Where Graphs Come In

Recall the structural blind spot from Part 1: vector search can only retrieve what is semantically similar to the query. Information that is relevant but phrased differently — or that only becomes relevant through a chain of relationships — is effectively invisible. As AWS's own framing puts it, information that is dissimilar is "structurally unavailable for retrieval."

Consider a multi-hop question: "Which suppliers are affected if the factory in Osaka goes offline?" The answer lives in relationships — factory → produces → component → used-in → product → sourced-from → supplier — that no single document states in full. Vector search retrieves chunks about Osaka and chunks about suppliers, but it can't traverse the connection. That's what GraphRAG is for.

Bedrock Knowledge Bases GraphRAG with Neptune Analytics

In March 2025, AWS made GraphRAG generally available as a built-in capability of Bedrock Knowledge Bases, backed by Amazon Neptune Analytics. The pitch is genuinely compelling: when you create a knowledge base, you choose Neptune Analytics as the store, and Bedrock automatically extracts entities and relationships from your documents, builds a knowledge graph, and combines vector search with graph traversal at retrieval time — no graph modeling, no Gremlin, no openCypher. It collapses what used to be a multi-week graph-engineering effort into a few hours of configuration.

flowchart TB
    Docs["Documents in S3"] --> Ingest["Bedrock Knowledge Base\nIngestion"]
    Ingest --> Embed["Generate embeddings\n(Titan / Cohere)"]
    Ingest --> Extract["LLM extracts entities\n& relationships"]
    Embed --> Vec["Vector index\n(Neptune Analytics)"]
    Extract --> Graph["Knowledge graph\n(Neptune Analytics)"]

    Query["User query"] --> VSearch["1 · Vector search\nfind seed nodes"]
    VSearch --> Vec
    VSearch --> Traverse["2 · Graph traversal\nexpand to linked nodes/chunks"]
    Traverse --> Graph
    Traverse --> Context["3 · Enriched context\n(chunks + relationships)"]
    Context --> LLM["4 · Foundation model\ngenerates grounded answer"]

Bedrock GraphRAG retrieval flow: an initial vector search finds seed nodes, then the graph is traversed to pull in related chunks and entities, producing context richer than vector search alone — and explainable, because you can see which relationships were followed.

The retrieval mechanic is worth understanding precisely. After an initial vector search finds the most relevant document chunks, GraphRAG retrieves the graph nodes (and linked chunk identifiers) connected to those chunks, then expands by traversing the graph to pull in their details. The result is context that captures the connections between entities, which is exactly what multi-hop questions need — and because the traversal path is inspectable, the answers are more explainable than opaque vector similarity.

Building GraphRAG Yourself: Neptune + Bedrock + LlamaIndex

If you need more control than the managed capability offers — custom traversal logic, a pre-existing graph schema, or integration into a larger agent — you can assemble GraphRAG directly. The common pattern uses Neptune as the graph store, Bedrock for the LLM, and LlamaIndex as the orchestration layer:

from llama_index.llms.bedrock import Bedrock
from llama_index.graph_stores.neptune import NeptuneDatabaseGraphStore
from llama_index.core import StorageContext
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

llm = Bedrock(model="anthropic.claude-3-sonnet-20240229-v1:0")

graph_store = NeptuneDatabaseGraphStore(
    host="", port=8182
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

# Hybrid: entity-keyword extraction + multi-hop traversal,
# plus natural-language-to-openCypher for flexible querying
retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    llm=llm,
    graph_traversal_depth=3,      # how many hops to expand
    with_nl2graphquery=True,      # let the LLM write openCypher
)

Two things make this robust. First, with_nl2graphquery=True lets the LLM translate the question into an openCypher query for precise graph lookups, going beyond keyword matching. Second, using a refine-style response mode lets the engine reconcile potentially incomplete graph-query results with standard vector retrieval — so if the generated Cypher misses, vector results still carry the answer.

The GraphRAG Toolkit

AWS also open-sourced a higher-level GraphRAG Toolkit that automates the indexing pipeline. Its LexicalGraphIndex extracts content into three tiers — lineage (sources and chunks), summarization (topics, statements, facts), and entity-relationships — giving you both local detail and global connectivity. Two retriever strategies ship with it: a TraversalBasedRetriever (top-down vector search combined with bottom-up entity keywords) and a SemanticGuidedRetriever (semantic entry points plus intelligent traversal and reranking). It defaults to Neptune for the graph, OpenSearch Serverless for vectors, and Bedrock for the models.

from graphrag_toolkit import LexicalGraphIndex, LexicalGraphQueryEngine

# Build the graph + vector representations from documents
graph_index = LexicalGraphIndex(graph_store, vector_store)
graph_index.extract_and_build(docs)

# Query with a traversal-based retriever
query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
    graph_store, vector_store
)
response = query_engine.query("How are these entities connected?")

Neptune Database vs Neptune Analytics

A point of confusion worth clearing up, because they're different products:

	Neptune Database	Neptune Analytics
Purpose	OLTP graph — persistent, transactional	OLAP graph — fast analytics & algorithms
Best for	Always-on app backends, large graphs	GraphRAG, graph algorithms, vector + graph
Vector support	Via app layer	✅ Built-in vector index
Bedrock GraphRAG	Manual (LlamaIndex pattern)	✅ Native managed integration
Query languages	Gremlin, openCypher, SPARQL	openCypher

For managed GraphRAG, Neptune Analytics is the answer — it has the built-in vector index and the native Bedrock integration. For a persistent application graph that you also query for RAG via the LlamaIndex pattern, Neptune Database fits.

The Other AWS RAG Building Blocks

pgvector on RDS / Aurora PostgreSQL

If your data already lives in PostgreSQL, the pgvector extension turns it into a vector store — no new system to operate. Enable it, add a vector column, and query by cosine distance. Titan Text Embeddings V2 generates the vectors; HNSW or IVFFlat indexes keep search fast at scale.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    content     TEXT,
    metadata    JSONB,
    embedding   vector(1024)          -- Titan V2 dimension
);

-- HNSW index for fast approximate nearest-neighbour search
CREATE INDEX ON documents
    USING hnsw (embedding vector_cosine_ops);

-- Retrieve the 10 most similar chunks, filtered by metadata
SELECT id, content
FROM documents
WHERE metadata->>'department' = 'oncology'
ORDER BY embedding <=> :query_embedding   -- cosine distance
LIMIT 10;

Note the WHERE clause — that's metadata filtering from Part 1, free and built into SQL. pgvector is the pragmatic choice when you want RAG without adopting a dedicated vector database and your scale is moderate (millions, not billions, of vectors).

Structured Retrieval — RAG That Writes SQL

Not all knowledge lives in prose. Bedrock Knowledge Bases also supports structured data retrieval: ask "what were our top-selling products last quarter?" and it generates and runs the appropriate SQL against a warehouse like Amazon Redshift, returning grounded numbers. This is the right pattern for metrics and aggregates — never try to answer numerical questions from vector-retrieved text chunks; route them to SQL instead.

The Semantic Layer: Tools Beat Raw Query Generation

Here's a pattern that separates demos from production systems, and it applies whether your backend is a graph or a warehouse. Letting an LLM generate raw openCypher or SQL directly is brittle — it works most of the time, which in production means it fails in front of users regularly. The fix is a semantic layer: instead of asking the model to write queries, you expose a curated set of tools (functions) that perform database interactions deterministically, and the model only decides which tool to call with which arguments.

The principle, stated well by practitioners building these systems, is to "turn prompt engineering problems, which might work most of the time, into code engineering problems, which work every time exactly as scripted." You trade some flexibility for reliability you can test.

flowchart LR
    User["User question"] --> Agent["LLM Agent\n(Bedrock)"]
    Agent -->|chooses tool + args| Tools["Semantic Layer\n(curated tools)"]
    Tools --> T1["find_entity(name)\nfull-text lookup"]
    Tools --> T2["get_related(id, type)\nsafe graph traversal"]
    Tools --> T3["get_metric(name, period)\nvalidated SQL"]
    T1 --> Graph["Neptune"]
    T2 --> Graph
    T3 --> WH["Redshift / Aurora"]
    Graph --> Agent
    WH --> Agent
    Agent --> Answer["Grounded answer"]

The semantic layer pattern: the agent never writes raw queries. It picks from tested, parameterized tools that encapsulate the query logic — deterministic, debuggable, and safe against malformed queries hitting the database.

On AWS, Amazon Bedrock AgentCore is purpose-built for this. Its Gateway turns existing APIs, Lambda functions, and databases into tools agents can call, and its Runtime hosts agents built in any framework (LangChain/LangGraph, Strands, CrewAI, LlamaIndex) behind a secure, serverless endpoint. You define your semantic-layer tools once and the agent orchestrates them. The AWS sample for a data-agnostic semantic layer demonstrates exactly this: discovery agents on AgentCore that auto-map a schema across RDS, Neptune, and OpenSearch, then query agents that answer natural-language questions through tools rather than raw query generation.

Choosing Your AWS RAG Path

Your situation	Recommended path
Search documents, answer with citations	Bedrock Knowledge Bases (managed, hybrid + rerank on)
Multi-hop, relationship-driven questions	Bedrock GraphRAG on Neptune Analytics
Custom traversal / existing graph schema	Neptune + Bedrock + LlamaIndex (DIY)
Data already in PostgreSQL, moderate scale	pgvector on RDS/Aurora
Numerical / metric questions	Structured retrieval → SQL on Redshift
Agent orchestrating many sources reliably	AgentCore + semantic-layer tools

Cost reality check: GraphRAG isn't free. Entity-and-relationship extraction runs an LLM over every chunk at ingestion — that's the expensive part, and re-ingesting a large corpus costs real money. Neptune Analytics and OpenSearch Serverless both bill for provisioned capacity even when idle. Before committing to GraphRAG, confirm your questions actually need multi-hop reasoning; if they don't, managed vector RAG with reranking is cheaper and simpler. Match the architecture to the question shape, not to the newest feature.

Where This Goes Next

You now have the full AWS RAG menu: managed vector RAG, native GraphRAG on Neptune Analytics, the DIY graph stack, pgvector, structured retrieval, and the semantic-layer agent pattern via AgentCore. Part 3 puts all of it under load on a domain where mistakes matter — a clinico-genomics RAG that connects variants, genes, diseases, drugs, and clinical trials, where every answer needs provenance and a human in the loop.

📚 Continue the series

← RAG From the Ground Up
RAG on AWS: Bedrock, GraphRAG & Neptune (this article)
Building a Clinico-Genomics RAG on AWS →