
Share
Researchers introduce ERASE, a novel technique that allows for precise updates to language models' knowledge bases without full retraining, ensuring they stay current with minimal effort.
In the rapidly changing world of natural language processing (NLP), keeping language models relevant and accurate is a significant challenge. Traditional methods often rely on retraining models from scratch or using retrieval-augmented generation, where new documents are inserted into a knowledge base for downstream tasks. However, these approaches can be inefficient and sometimes fail to reflect the latest information accurately.
A recent paper titled "Language Modeling with Editable External Knowledge" by Belinda Z. Li, Emmy Liu, Alexis Ross, Abbas Zeitoun, Graham Neubig, and Jacob Andreas introduces ERASE (Editable Retrieval-Augmented System for Efficiency), a novel method that enhances model behavior when new documents are added to the knowledge base. Instead of just retrieving new information, ERASE incrementally deletes or rewrites other entries in the knowledge base to maintain coherence and relevance.
Incremental Updates: Unlike static retrieval-augmented systems, ERASE allows for dynamic updates. Each time a new document is added, it triggers a process where existing entries are either deleted or rewritten to ensure consistency.
Benchmark Performance: ERASE was evaluated on two new benchmark datasets designed to test models' ability to answer questions about a stream of news articles or conversations. The results show significant improvements:
ERASE operates through a multi-step process:

Decision Making: Based on the similarity search, ERASE decides whether to:
Model Update: The updated knowledge base is then used to retrain the language model incrementally.
Architecture: ERASE leverages a combination of transformer models for feature extraction and decision-making processes.
Benchmarks:
Code and Data Availability: The authors have made the code and data used in their experiments available on GitHub at this link.
ERASE represents a significant step forward in maintaining the relevance and accuracy of language models in dynamic environments. By allowing for incremental updates and ensuring consistency within the knowledge base, ERASE addresses key challenges faced by traditional retrieval-augmented systems. The impressive performance gains on benchmark datasets further validate its effectiveness. For practitioners, this approach offers a practical solution to keeping NLP models up-to-date without the need for extensive retraining.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
19 June 2024
88 articles
Related Articles
Related Articles
More Stories