Glossary
What is retrieval-augmented generation (RAG)?
A technique where an AI model retrieves relevant documents at query time and uses them to ground its answer in your own data.
Definition
Retrieval-augmented generation (RAG) is a technique where an AI model retrieves relevant documents from a trusted source at query time and uses them to ground its answer, reducing hallucination and letting the model cite your own data without being retrained on it.
How RAG works
RAG adds a retrieval step in front of the model. When a question comes in, the system searches a trusted source, your own documents, a database, a knowledge base, for the passages most relevant to that question. It hands those passages to the model along with the question, and the model writes its answer grounded in what it was just given rather than what it happened to memorize during training.
The practical payoff is twofold. The model can work with information it was never trained on, including a document created this morning, and it can point to the exact source behind each claim.
Does RAG stop hallucinations?
It reduces them, it does not eliminate them. Grounding an answer in retrieved source text gives the model something real to lean on, and a good RAG system cites the passage it used so a human can check the work. But the model can still misread a passage or stretch beyond what the source actually supports.
In effect, RAG turns trust me into here is my source. That is a large improvement, not a guarantee, and it is why citations matter in regulated work.
RAG vs. fine-tuning
Fine-tuning changes the model's weights by training it further on your data. RAG leaves the model untouched and feeds it your data at the moment of the question. For most business needs, RAG is the better first move: it is cheaper, it updates the instant a document changes, it keeps a clear audit trail of sources, and it does not bake sensitive data into a model that is hard to unlearn later.
Fine-tuning still earns its place for matching a house style, tone, or a narrow repetitive task, and the two are often combined. A simple way to hold it in your head: fine-tuning teaches the model how to talk, retrieval tells it what is true right now.
Frequently asked questions
- What is RAG?
- Retrieval-augmented generation (RAG) is a technique where an AI model retrieves relevant documents from a trusted source at query time and uses them to ground its answer. It reduces hallucination and lets the model cite your own data without being retrained on it.
- Does RAG stop hallucinations?
- RAG reduces hallucinations but does not eliminate them. Grounding answers in retrieved source text and citing the passage used makes the work checkable, but the model can still misread a source or overreach. RAG turns trust me into here is my source.
- Is RAG better than fine-tuning?
- For most business needs RAG is the better first move: it is cheaper, updates the moment a document changes, keeps an audit trail of sources, and avoids baking sensitive data into the model. Fine-tuning still helps for style or narrow tasks, and the two are often combined.
Want this put to work inside infrastructure you control? We build it.
Book a demo