
Share
Researchers introduce BAR, a method that allows separate experts to be trained independently and merged efficiently, enhancing language model updates without retraining from scratch.
April 20, 2026
Jacob Morrison, Sanjay Adhikesaven, Akshita Bhagia, Matei Zaharia, Noah A. Smith, and Sewon Min - Ai2
After pretraining, language models go through a series of mid- and post-training stages to become practically useful-learning to follow instructions, reason through problems, reliably call tools, and so on. However, updating or extending these models is often challenging. Retraining from scratch with new capabilities included is the most reliable option but is expensive and requires full access to the original training setup. Training further on new data is cheaper, but it can cause the model to lose capabilities it already had. Post-training typically involves multiple stages-each with its own data and objectives-making it difficult to add new skills without breaking what came before.
To address these issues, we introduce BAR (Branch-Adapt-Route), a method for modular post-training that allows you to train independent domain experts and compose them into a unified model via a mixture-of-experts (MoE) architecture. Each expert can be developed, upgraded, or replaced without affecting the others.
Our earlier work on FlexOlmo demonstrated that modular MoE-based training works well for pretraining. You can branch from a shared base, train domain-specific feed-forward network (FFN) experts while freezing all shared layers, and merge them back. However, this approach doesn't transfer to post-training. Pretraining primarily updates knowledge representations, which are mostly in FFN layers. Post-training introduces behavioral shifts such as new output formats, reasoning patterns, and safety constraints that require changes to shared parameters like attention layers, embeddings, and the language modeling head.
For example, when we tried the FlexOlmo approach directly during reinforcement learning with verified rewards (RLVR), the reward curve was completely flat; the model simply could not learn with all shared parameters frozen. This led us to develop a new recipe specifically for post-training: BAR.

BAR consists of three main steps: Branch, Adapt, and Route.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
21 April 2026
133 articles
Related Articles
Related Articles
More Stories