Google recently announced their new Gemini model can handle over 1 million tokens of context length - a huge leap in AI capabilities. Many have proclaimed this advancement spells the end of Retrieve and Generate (RAG) systems. However, we don't believe long context length represents the demise of RAG, for several key reasons:
Cost and Speed
Long context lengths are extremely expensive to run computationally. The more context provided, the slower and more resource intensive the model inference becomes. RAG systems help reduce the tokens needing processing by retrieving the most relevant passages upfront, enabling faster and cheaper overall results.
Unproven Performance
While impressive in scale, it is still undetermined how accurate Gemini's recall abilities are over such vast contexts. RAG systems aim to optimize the entire pipeline - search, embeddings and ranking - to feed prompts relevant content. Gemini's memory performance over 1 million tokens requires further evaluation.
Loss of Auditability
A major advantage of RAG systems is providing audit trails showing what content was deemed relevant as input. This grant some explainability into the otherwise "black box" workings of AI. With ultra long contexts like Gemini's, auditability is lost by sheer volume, hampering its usefulness for many enterprise use cases.In summary, while an exciting advancement showing AI's potential, long context length alone is unlikely to make RAG obsolete quite yet. The strengths around cost, performance optimization and auditability mean RAG still has significant value in operational environments. We look forward to seeing how these capabilities evolve together over time.
Deploying Enterprise-Grade AI in Your Environment?
Unlock unparalleled performance, security, and customization with the TitanML Enterprise Stack