The RAG Atlas: A Visual Guide to Retrieval Patterns
Ten RAG architectures, visually mapped with interactive diagrams and a live simulator

Select a pattern to inspect its retrieval flow. Hover nodes for details, then adjust controls to see trade-offs in real time.
Vanilla RAG
Vanilla RAG
The baseline: embed, search, generate
What it is
A single-pass retrieval pattern that embeds the user's query, finds the nearest matching chunks in a vector index, and gives those chunks to the LLM as grounding context.
Best for
Internal knowledge bases with clean, chunked documents
Core tradeoff
Simplest to build and debug vs Brittle on ambiguous queries
Failure mode
Returns plausible-sounding but wrong chunks when queries are ambiguous or the corpus has near-duplicate contradictory passages.
Detailsexpand
Pros
- + Simplest to build and debug
- + Lowest latency of all RAG variants
- + Single embedding model to manage
Cons
- − Brittle on ambiguous queries
- − No score calibration across chunks
- − Sensitive to chunk size and overlap
Latency: 50–150ms retrieval + LLM generation. Total: 500–2000ms.
Cost: 1× retrieval. ~$0.0001 per query at ada-002 pricing.
Live Simulator
Adjust settings to see trade-off effects
Directional model only: this shows relative behavior, not production benchmarks.
Smaller chunks improve precision but can miss context; larger chunks add context but can add noise and cost.
Higher k increases recall but expands context and latency. Too high can dilute relevance.
Reranking usually lowers accuracy risk, but adds extra compute/latency.
Hybrid blends lexical + semantic retrieval to improve recall on exact terms.
Lower is better for latency and cost. Lower accuracy risk means safer retrieval behavior.
Meters are relative trade-offs for this pattern, not measured production telemetry.