
Share
Apple’s technical report offers a deep dive into the intricacies of its new AI models, revealing a novel architecture for its on-device solution and insights into training and optimization processes.
During WWDC25, Apple unveiled new versions of its on-device and cloud-based foundation models. Now, the company has released a detailed technical report, "Apple Intelligence Foundation Language Models – Tech Report 2025," which delves into how these models were trained, optimized, and evaluated. Here are some key highlights that will be particularly interesting for practitioners.
One of the most intriguing aspects of Apple's on-device model is its architecture. The model, which has around 3 billion parameters, is split into two distinct blocks:
This design decision has several practical benefits:
Despite these optimizations, Apple claims that the overall performance and output quality of the model are preserved. This split architecture is a clever way to balance resource constraints with computational efficiency.
Apple's foundation models are trained on a diverse range of data sources to ensure they can handle various tasks effectively. The company leverages:
The pre-training process involves several stages:

Apple has also detailed how they developed tools to facilitate the training and deployment of these models:
In terms of optimizations:
Apple has provided benchmarks comparing its on-device model with external cloud-based models. The results show that:
It's worth noting that Apple has been exploring various techniques to optimize large language models (LLMs) for on-device use. A few years ago, they published a study on swapping parts of an LLM between RAM and flash storage as needed. While this approach wasn't ultimately used in the current models, it demonstrates the company's ongoing commitment to innovation in this space.
Apple's new foundation models represent significant advancements in both on-device and cloud-based AI. The technical details provided in the report offer valuable insights for practitioners looking to understand and leverage these models. By splitting the local model into two blocks and employing a range of optimizations, Apple has managed to create efficient, high-performing models that can run seamlessly on user devices.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
22 July 2025
88 articles
Related Articles

OpenEvidence Targets Hospitals to Expand Its AI Chatbot for Doctors
Products & Applications · 3 min

OpenEvidence Launches Voice AI to Enhance Physician Workflow
Products & Applications · 3 min

Doximity Accelerates AI Investment in 2026, Targeting Multibillion-Dollar Market
Products & Applications · 3 min
Related Articles

OpenEvidence Targets Hospitals to Expand Its AI Chatbot for Doctors
Products & Applications · 3 min

OpenEvidence Launches Voice AI to Enhance Physician Workflow
Products & Applications · 3 min

Doximity Accelerates AI Investment in 2026, Targeting Multibillion-Dollar Market
Products & Applications · 3 min
More Stories