Defined Term mechanism updated Wed Jun 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Retrieval-Augmented Generation (RAG)

The mainstream pattern for using LLMs with document collections: documents are indexed, relevant chunks are retrieved at query time, and the LLM generates an answer from them. Per llm-wiki-gist, this is how NotebookLM, ChatGPT file uploads, and most RAG systems work.

The canonical definition (rag-original-paper)

The term and architecture come from Lewis et al., 2020 (Facebook AI) — rag-original-paper, now the wiki’s neutral primary source for RAG (the gap below had been flagged: prior framing came only from advocates of alternatives). It paired a parametric seq2seq generator with a non-parametric dense vector index of Wikipedia (DPR retriever), in RAG-Sequence and RAG-Token variants, and sold two durable properties: provenance and updatable knowledge without retraining. Crucially, the 2020 design never claimed accumulation or factual-graph connection — so the critiques below extend it rather than refute it.

How the source frames it

RAG is presented as the status quo that the llm-wiki pattern improves on. Its limitation, per the source: the LLM “rediscovers knowledge from scratch on every question” — nothing accumulates. A subtle question requiring synthesis across several documents forces the model to re-find and re-piece fragments each time.

Contrast with llm-wiki

RAG: retrieval + generation at query time; no persistent intermediate artifact.
LLM Wiki: knowledge compiled once into a maintained wiki; synthesis and cross-references already exist before the question is asked.

A second data point (gbrain)

gbrain supplies the first concrete comparison in this wiki. It benchmarks a hybrid retriever plus a typed-edge knowledge-graph against vector-only RAG and ripgrep-BM25, reporting +31.4 points P@5 from the graph (P@5 49.1% / R@5 97.9% on a 240-page rich-prose corpus). Its framing of the gap: “Vector search returns chunks that are semantically close. The graph returns chunks that are factually connected.” So the critique of RAG here is less “retrieval is bad” than “semantic retrieval alone misses factual connections that an explicit graph captures.”

Hybrid retrieval — the neutral source (hybrid-retrieval-rag)

The dedicated vendor-neutral source (InfoQ): vector-alone fails because embeddings are approximation engines that collapse distinguishing tokens (version numbers, error codes, flag names); production needs hybrid retrieval — dense vectors + BM25 (exact match) fused via Reciprocal Rank Fusion (k≈60), optionally cross-encoder reranking (top-50). Most production queries are hybrid (concept + identifier), which single-method retrieval systematically fails; validated at Perplexity/Glean. This independently corroborates gbrain‘s hybrid retriever, separating that engineering design from GBrain’s motivated benchmark. So the RAG critique splits cleanly: BM25 closes the exact-token gap, an explicit knowledge-graph closes the factual-connection gap, and the llm-wiki pattern targets the no-accumulation gap.

A fourth gap — temporal validity (agent-memory-knowledge-graphs)

For persistent agent memory, a further gap appears: evolving facts. When a fact changes (the canonical example: a user moving cities), vector search returns the old and new versions as equally relevant — both are semantically close — so the agent can’t tell which is current. A temporal-knowledge-graph (bi-temporal modeling; built with graphiti) time-bounds superseded facts, answering “what is true now.” This adds the temporal-validity gap to the taxonomy above and is framed as graphs “quietly replacing RAG” specifically for agent systems — while vector RAG remains strong for static document retrieval.

Note: the llm-wiki/gbrain “RAG doesn’t accumulate” framing still comes from advocates of an alternative; but the retrieval-mechanics claims are now corroborated neutrally by hybrid-retrieval-rag, and the baseline itself is now grounded in the primary source rag-original-paper (Lewis et al., 2020) rather than only in critics’ summaries.