Skip to content

Retrieval-augmented generation

RAG grounds LLM outputs in external documents at query time, but its limitations around cross-document synthesis have pushed practitioners toward alternatives like compiled knowledge bases that pre-synthesize information into structured, queryable Markdown.

6 sources · May 3, 2026

Compiled by Claude · How this works →

Agents · LLMs · 35 neighbors

Retrieval-augmented generation (RAG) is the practice of embedding a query, retrieving semantically similar document chunks from a vector store, and supplying those chunks as context to an LLM before generation. The approach keeps a model’s factual grounding updatable without retraining, and it scales reasonably well to large document collections. It has become standard enough that multimodal variants now exist: the 2025 VLM landscape overview notes multimodal RAG as one of the notable developments in the vision-language model space, where retrieved content can include images and video alongside text.

Despite its prevalence, RAG has a structural weakness: it retrieves fragments, not synthesized understanding. For curated research corpora where relationships across many documents matter, per-query chunk retrieval often misses the connections between sources. A developer who built Karpathy’s LLM Wiki end-to-end found that pre-synthesized knowledge bases outperform RAG for this use case precisely because cross-document synthesis happens at ingest time rather than at query time. The tradeoff is unforgiving: hallucinations introduced during ingest are baked structurally into every downstream answer, which makes lint and validation steps non-negotiable.

The practical walkthrough of Karpathy’s pattern describes an alternative architecture where the LLM builds and maintains structured Markdown files from raw documents, enabling direct querying at scale without a retrieval step at all. That approach trades RAG’s freshness and modularity for synthesis quality, and it only works when the input corpus is curated enough that ingest-time errors can be caught and corrected.

Related concepts