The RAG Atlas: A Visual Guide to Retrieval Patterns
Ten RAG architectures, visually mapped with interactive diagrams and a live simulator

Select a pattern to inspect its retrieval flow. Hover nodes for details, then adjust controls to see trade-offs in real time.
Vanilla RAG
Vanilla RAG
The baseline: embed, search, generate
Best for
Internal knowledge bases with clean, chunked documents
Core tradeoff
Simplest to build and debug vs Brittle on ambiguous queries
Failure mode
Returns plausible-sounding but wrong chunks when queries are ambiguous or the corpus has near-duplicate contradictory passages.
Detailsexpand
Pros
- + Simplest to build and debug
- + Lowest latency of all RAG variants
- + Single embedding model to manage
Cons
- − Brittle on ambiguous queries
- − No score calibration across chunks
- − Sensitive to chunk size and overlap
Latency: 50–150ms retrieval + LLM generation. Total: 500–2000ms.
Cost: 1× retrieval. ~$0.0001 per query at ada-002 pricing.
Live Simulator
Adjust settings to see trade-off effects
Directional model only: this shows relative behavior, not production benchmarks.
Smaller chunks improve precision but can miss context; larger chunks add context but can add noise and cost.
Higher k increases recall but expands context and latency. Too high can dilute relevance.
Reranking usually lowers accuracy risk, but adds extra compute/latency.
Hybrid blends lexical + semantic retrieval to improve recall on exact terms.
Lower is better for latency and cost. Lower accuracy risk means safer retrieval behavior.
Meters are relative trade-offs for this pattern, not measured production telemetry.