
Share
Anthropic's new Contextual Retrieval technique boosts RAG systems, ensuring AI models can access precise, relevant data for better performance in specialized fields like customer support and law.
For an AI model to excel in specific contexts, it often needs access to relevant background knowledge. This is particularly true for applications like customer support chatbots and legal analysts, where the model must understand the nuances of a particular business or legal landscape. Traditionally, developers have used Retrieval-Augmented Generation (RAG) to enhance an AI model's knowledge by retrieving relevant information from a knowledge base and appending it to the user's prompt. However, traditional RAG methods often struggle with context, leading to failed retrievals.
In this post, we introduce Contextual Retrieval, a method that significantly improves the retrieval step in RAG. This technique leverages two sub-techniques: Contextual Embeddings and Contextual BM25. According to Anthropic, Contextual Retrieval can reduce the number of failed retrievals by 49%, and when combined with reranking, this improvement jumps to 67%. These enhancements directly translate to better performance in downstream tasks.
For smaller knowledge bases (less than 200,000 tokens or about 500 pages), you can bypass RAG entirely by including the entire knowledge base in the model's prompt. Anthropic recently introduced prompt caching for Claude, which makes this approach faster and more cost-effective:

You can learn more about prompt caching in Anthropic's prompt caching cookbook.
For larger knowledge bases that exceed the context window, RAG is the go-to solution. The process involves:
At runtime, when a user submits a query:
You can easily deploy your own Contextual Retrieval solution using Claude with Anthropic's cookbook. This guide provides step-by-step instructions and best practices for integrating Contextual Retrieval into your applications.
Contextual Retrieval represents a significant advancement in RAG, making it easier to build AI models that can effectively leverage large knowledge bases. By reducing failed retrievals and improving accuracy, this method enhances the performance of downstream tasks, ultimately leading to more useful and reliable AI applications.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
23 September 2024
133 articles
Related Articles
Related Articles
More Stories