
Share
INTELLECT-1 marks a new era in AI development by successfully training a massive 10 billion parameter model across international borders, challenging traditional centralized approaches to large-scale machine learning.
Today, Prime Intellect is proud to release INTELLECT-1, the first 10 billion parameter language model trained collaboratively across five countries and three continents. This achievement represents a significant scale-up from our previous research, demonstrating that large-scale model training can be effectively decentralized and community-driven.
INTELLECT-1 marks the first large-scale experiment in globally distributed training. The model was trained on up to 112 H100 GPUs, achieving an overall compute utilization of 83% across continents and 96% when exclusively using nodes within the United States. This minimal overhead compared to centralized training approaches is a significant achievement.
These results open new possibilities for community-driven training of frontier foundation models, proving that large-scale AI can be democratized.
The PRIME framework was pivotal in achieving this milestone. Key innovations include:

Using PRIME with DiLoCo and a custom int8 all-reduce, we achieved an overall 400x reduction in communication bandwidth compared to traditional data-parallel training settings while maintaining comparable performance at the 10B scale.
INTELLECT-1 is based on the Llama-3 architecture with the following specifications:
The model was trained on a carefully curated 1 trillion token dataset mix:
Training completed over 42 days using a WSD learning rate schedule.
The release of INTELLECT-1 and the PRIME framework is a game-changer for the AI community. It demonstrates that:
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
2 December 2024
88 articles
Related Articles
Related Articles
More Stories