
Share
Researchers at Stanford’s Hazy Research group unveil Monarch Mixer, a groundbreaking model that extends BERT’s context length up to 32K tokens, revolutionizing long-document analysis and retrieval.
Text embeddings are a cornerstone of many modern applications, from search engines and RAG (Retrieval-Augmented Generation) systems to vector databases. However, most embedding models, which are typically BERT/Transformer-based, have short context lengths-around 512 tokens. This is roughly equivalent to two pages of text, but real-world documents can be much longer, often spanning tens of thousands of tokens. To address this, researchers at Stanford's Hazy Research group are taking a significant step forward with long-context retrieval models.
The foundation for these long-context models is the Monarch Mixer (M2) family. M2 is an innovative model that eschews traditional attention and MLP layers, making it possible to handle much longer contexts while maintaining efficiency. Today, the team is releasing a preview of several M2-BERT models with context lengths up to 32K tokens, fine-tuned for long-context retrieval.
To enable these new models, the researchers had to make significant adjustments to both the data mixtures and loss functions. Here’s a breakdown:
Data Mixture Adjustments:
Loss Function Innovations:
The team has released the following models on HuggingFace:

These models are also available via Together AI’s new embedding service, which you can explore here. The models have already been beta-tested at a MongoDB hackathon and integrated into RAG systems like LangChain and LlamaIndex.
To evaluate the performance of these long-context retrieval models, the team has introduced LoCo (Long-Context), a new benchmark. LoCo includes a variety of retrieval tasks with long documents, though it is still in its early stages. The researchers are actively seeking feedback and contributions to expand the benchmark.
The team is eager for community feedback on these models and the LoCo benchmark:
A full paper detailing these developments will be released next month. For now, the team is excited to share this preview and gather valuable insights from the community.
Tags
Original Sources
↗ https://hazyresearch.stanford.edu/blog/2024-01-11-m2-bert-retrieval?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
12 January 2024
88 articles
Related Articles
Related Articles
More Stories