
Share
Researchers at Sapient have developed a novel approach to training foundation models using a Hierarchical Recurrent Model (HRM), significantly reducing costs and data requirements.
When it comes to training large language models (LLMs), the cost and resource demands are often prohibitive. Most organizations opt for fine-tuning pre-existing models rather than building from scratch, due to the astronomical expenses involved. However, researchers at Sapient have introduced a groundbreaking solution with HRM-Text, a model that promises to change this paradigm.
HRM-Text leverages a Hierarchical Recurrent Model (HRM) architecture, which is highly sample-efficient and significantly reduces the computational burden compared to traditional Transformers. This innovation not only cuts costs but also aligns more closely with real-world enterprise needs, where targeted responses are crucial.
The key to HRM-Text's efficiency lies in its unique architecture. Unlike standard Transformers that rely on brute-force autoregressive prediction of raw text, HRM decouples computation into two layers: a slow-evolving strategic layer and a fast-evolving execution layer. This design allows the model to focus on higher-level reasoning and task-specific instructions rather than memorizing vast amounts of data.
By training exclusively on instruction-response pairs, HRM-Text can develop a deep understanding of human language and reasoning with far fewer tokens. This approach is particularly beneficial in enterprise settings where users expect precise, task-oriented responses.
The researchers were able to train a 1B-parameter HRM-Text model from scratch for around $1,500, using significantly fewer training tokens compared to conventional LLMs. Despite the reduced cost and data requirements, HRM-Text achieved competitive performance on key industry benchmarks, demonstrating its potential as a viable alternative to larger, more resource-intensive models.

For real-world AI applications, this breakthrough means that foundational pretraining is no longer limited to well-resourced institutions. Organizations can now affordably train their own highly capable reasoning models from scratch and integrate them with external knowledge stores. This shift has several practical implications:
Guan Wang, CEO of Sapient Intelligence, emphasized the importance of this shift in a statement provided to VentureBeat. "Enterprises today face three compounding problems: training is expensive, infrastructure is heavy, and experimentation cycles are too slow," Wang said. "HRM-Text addresses these issues by providing a cost-effective, lightweight, and agile solution."
The current approach to LLM training involves scraping the internet for massive datasets and running next-token prediction trillions of times. This brute-force method not only incurs high costs but also forces models to memorize vast amounts of irrelevant data just to indirectly learn how to think. Standard decoder-only models, for instance, waste valuable compute on reconstructing prompts that are already known at inference time.
HRM-Text's focus on instruction-response pairs aligns more closely with the actual needs of enterprise users. Instead of memorizing the exact sequence of words from random internet content, the model develops a deep understanding of human language and reasoning through targeted training. This approach ensures that the model is better equipped to handle real-world tasks and provide accurate, context-aware responses.
The development of HRM-Text by Sapient represents a significant step forward in making foundation models more accessible and practical for a wider range of applications. As the industry continues to explore innovative approaches to AI training, this breakthrough could pave the way for more efficient and cost-effective solutions in the future.
Tags
Original Sources
A $1,500 foundation model that rivals larger LLMs
↗ https://venturebeat.com/technology/researchers-say-they-trained-a-foundation-model-from-scratch-for-about-1-500
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
15 June 2026
67 articles
Related Articles

Global Researchers Compete to Shape AI's Future in Organizations
Models & Research · 3 min

MiniMax M3 Challenges GPT-5.5 and Gemini 3.1 Pro with Superior Performance at a Fraction of the Cost
Models & Research · 4 min

OpenAI’s AI Solves Complex Math Problems by Leveraging Pattern Recognition
Models & Research · 4 min
Related Articles

Global Researchers Compete to Shape AI's Future in Organizations
Models & Research · 3 min

MiniMax M3 Challenges GPT-5.5 and Gemini 3.1 Pro with Superior Performance at a Fraction of the Cost
Models & Research · 4 min

OpenAI’s AI Solves Complex Math Problems by Leveraging Pattern Recognition
Models & Research · 4 min
More Stories
© 2026 Cedar & Bloom. All rights reserved.