Skip to main content

The RAG Atlas: A Visual Guide to Retrieval Patterns

Ten RAG architectures, visually mapped with interactive diagrams and a live simulator

12 min read
RAG Atlas interface preview

Select a pattern to inspect its retrieval flow. Hover nodes for details, then adjust controls to see trade-offs in real time.

Vanilla RAG

top-50User QueryEmbed QueryVector SearchTop-k ChunksLLM

Vanilla RAG

The baseline: embed, search, generate

Best for

Internal knowledge bases with clean, chunked documents

Core tradeoff

Simplest to build and debug vs Brittle on ambiguous queries

Failure mode

Returns plausible-sounding but wrong chunks when queries are ambiguous or the corpus has near-duplicate contradictory passages.

Detailsexpand

Pros

  • + Simplest to build and debug
  • + Lowest latency of all RAG variants
  • + Single embedding model to manage

Cons

  • Brittle on ambiguous queries
  • No score calibration across chunks
  • Sensitive to chunk size and overlap

Latency: 50–150ms retrieval + LLM generation. Total: 500–2000ms.

Cost: 1× retrieval. ~$0.0001 per query at ada-002 pricing.

Live Simulator

Adjust settings to see trade-off effects

Directional model only: this shows relative behavior, not production benchmarks.

512 tokens
1282 048

Smaller chunks improve precision but can miss context; larger chunks add context but can add noise and cost.

5 chunks
120

Higher k increases recall but expands context and latency. Too high can dilute relevance.

Reranking usually lowers accuracy risk, but adds extra compute/latency.

Hybrid blends lexical + semantic retrieval to improve recall on exact terms.

Latencylow
Costlow
Accuracy riskmoderate

Lower is better for latency and cost. Lower accuracy risk means safer retrieval behavior.

Meters are relative trade-offs for this pattern, not measured production telemetry.

John Munn

Technical leader building scalable solutions and high-performing teams through strategic thinking and calm, reflective authority.

© 2026 John Munn. All rights reserved.