In the last post, we discussed both benefits and challenges to AI. LLMs have touched fields as diverse as chemistry, law and software engineering and their influence only looks to continue to grow. Having a companion that you can talk to for information or to accomplish tasks is, like it or not, a huge selling point that improves human satisfaction with their work.  However, when it comes to information gathering, LLMs still suffer a number of problems with bias, hallucination and lack of verifiability. This week, we will tackle one of the promising means of mitigating these concerns. It is called Retrieval-Augmented Generation.

Solution – Retrieval-Augmented Generation (RAG)

Addressing these challenges is critical for a more trustworthy, reliable and ethical AI. Ideally, we would like to combine the strengths of AI (understanding a user’s questions, being able to creatively extract information) with the strengths of traditional documentation and web search (a clear chain of citation, reliability, exhaustiveness). The simplest approach, and the one we take, is simply to make relevant documents available to our LLM models. This solution is called Retrieval-Augmented Generation (RAG) and promises to blend the conversational strengths of AI with more traditional academic rigor.

RAG creates another form of “long-term memory” in the form of a vector database that we put into the “short-term memory” of the context window. That way, we don’t need to do the costly task of retraining the LLM’s weights, but we can still rely on its basic long-term domain knowledge to summarize and interpret for us. An LLM is building its response based on what is in its context window. LLMs are trying to predict the next word but their strength in this task comes from the fact that they can incorporate the context that came before. We observe that LLMs learn patterns like “repeat this phrase” or “tl;dr” (for summarization) based on trying to fit the vast amount of internet data they consume. After the supervised fine-tuning step, LLMs like ChatGPT are really good at leveraging their context window for insights and can be used for tasks like summarization. That is the fundamental fact that we are taking advantage of here.

Think of RAG as ChatGPT writing an academic paper with open resources. It leans on its fundamental subject knowledge (pretrained weights), while enriching specifics with information from documents—a synthesis of AI’s prowess and traditional credibility. The specifics of a document, like individual details in an article, might not be remembered in the weights, but the general knowledge—the main theme or the big ideas—are more likely to stick around. The context window, on the other hand, is much more specific and is therefore capable of having a much larger and more specific effect on the answer. Crucially, you can combine your software database with ChatGPT’s own abilities to give citations. ChatGPT can tell you it’s pulling some knowledge from the third document it’s received, and you can turn that right into a citation. You can even use the internals of ChatGPT’s “attention” mechanism to factually see what words or documents it pays attention to when providing its answers. Empirically, it creates much better results which is why companies like Microsoft have integrated this approach into products like Bing.

The promise of RAG lies in its role as an approximation of the ideal solution, blending AI’s agility with traditional rigor. Empowering LLM models with relevant documents creates a fusion reminiscent of a well-cited academic paper. That may still be underselling it. LLMs that use RAG still have the other advantages that AI provides. They’re responsive to questions. They’re able to understand context. In an ideal world, the AI will make you feel like you’re diving into a stimulating discussion with your favorite professor, giving you cited answers customized specifically for you.


RAG isn’t a magic wand. It addresses hallucination and helps a bit with source bias. You still depend on sources though. Comprehensive mitigation isn’t just a technical question. It requires technological advancements for sure, but it also requires societal shifts. “Garbage in, garbage out” is sadly the case with bigoted training data. Besides these fundamental social limitations, there are also difficulties with implementation and there is a lot of nuance in setting up the retrieval and augmentation mechanisms. Achieving these ends is a journey that involves overcoming technological barriers and advocating for equitable data representation. With the promise AI brings, I think it’s worth it.

Next week we will dive head-first into the technical aspects of the challenge. As we advance in enhancing AI’s authenticity through Retrieval-Augmented Generation, the upcoming post will delve into both technical intricacies and implementation challenges. It’s not too difficult but there are a few pitfalls we want to avoid. Stay tuned for the forthcoming post, as we dissect the strengths, limitations, and practical applications of this approach!