
Share
Researchers unveil LLaDA, an 8-billion parameter diffusion model that matches LLaMA3's capabilities, pushing boundaries beyond traditional autoregressive models and challenging existing performance benchmarks.
In a significant advancement for large language models (LLMs), researchers from Renmin University of China and Ant Group have introduced LLaDA (Large Language Diffusion with Masking), an 8-billion parameter diffusion model trained from scratch. This new model rivals the performance of LLaMA3 8B, challenging the dominance of autoregressive models in the field.
The team behind LLaDA posits that the intelligence of LLMs-manifested through scalability, instruction-following, in-context learning, conversational ability, and compression-stems from the core principle of generative modeling rather than the autoregressive mechanism itself. Generative models approximate the true language distribution using maximum likelihood estimation (MLE), a principle that LLaDA leverages to achieve its remarkable capabilities.

LLaDA demonstrates impressive scalability, showing highly competitive performance trends when compared to autoregressive baselines trained on the same data. This scalability is crucial for handling large datasets and complex tasks, making LLaDA a viable alternative to existing models in various applications.
User: Emma decides to run 4 sprints 5 times a week. She runs 75 meters each sprint. How many total meters does she run a week?
LLaDA-Instruct: Emma runs 4 sprints each time, so she runs a total of 4 x 5 = 20 sprints a week. Each sprint is 75 meters long, so Emma runs a total of 20 x 75 = 1500 meters a week. The final result is 1500.
User: Please recommend me three famous movies
LLaDA-Instruct: Certainly! Here are three famous movies that have achieved widespread popularity and critical acclaim:
LLaDA represents a significant step forward in the development of large language models. By leveraging diffusion and masking techniques, it achieves performance on par with leading autoregressive models like LLaMA3 8B. This opens up new possibilities for scalable and contextually rich language generation, making it a valuable addition to the AI research landscape.
Tags
Original Sources
↗ https://ml-gsai.github.io/LLaDA-demo/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
18 February 2025
88 articles
Related Articles
Related Articles
More Stories