The RAG Atlas: A Visual Guide to Retrieval Patterns

Ten RAG architectures, visually mapped with interactive diagrams and a live simulator

February 20, 2026

12 min read

Select a pattern to inspect its retrieval flow. Hover nodes for details, then adjust controls to see trade-offs in real time.

Query

Embeddings

Chunks

Scores

SQL/tool

Vanilla RAG

hover nodes to inspectFocus mode

Vanilla RAG

The baseline: embed, search, generate

Best for

Internal knowledge bases with clean, chunked documents

Core tradeoff

Simplest to build and debug vs Brittle on ambiguous queries

Failure mode

Returns plausible-sounding but wrong chunks when queries are ambiguous or the corpus has near-duplicate contradictory passages.

Detailsexpand

Pros

Cons

Latency: 50–150ms retrieval + LLM generation. Total: 500–2000ms.

Cost: 1× retrieval. ~$0.0001 per query at ada-002 pricing.

Live Simulator

Adjust settings to see trade-off effects

Directional model only: this shows relative behavior, not production benchmarks.

Chunk size512 tokens

1282 048

Smaller chunks improve precision but can miss context; larger chunks add context but can add noise and cost.

Top-k5 chunks

120

Higher k increases recall but expands context and latency. Too high can dilute relevance.

Reranker

Reranking usually lowers accuracy risk, but adds extra compute/latency.

Hybrid search

Hybrid blends lexical + semantic retrieval to improve recall on exact terms.

Latencylow

Costlow

Accuracy riskmoderate

Lower is better for latency and cost. Lower accuracy risk means safer retrieval behavior.

Meters are relative trade-offs for this pattern, not measured production telemetry.