Modular Post-Training with BAR: Train Separate Experts, Merge for Efficiency

Models & Research

The Engineer

21 Apr 2026 · 4 min read

Researchers introduce BAR, a method that allows separate experts to be trained independently and merged efficiently, enhancing language model updates without retraining from scratch.

April 20, 2026

Jacob Morrison, Sanjay Adhikesaven, Akshita Bhagia, Matei Zaharia, Noah A. Smith, and Sewon Min - Ai2

After pretraining, language models go through a series of mid- and post-training stages to become practically useful-learning to follow instructions, reason through problems, reliably call tools, and so on. However, updating or extending these models is often challenging. Retraining from scratch with new capabilities included is the most reliable option but is expensive and requires full access to the original training setup. Training further on new data is cheaper, but it can cause the model to lose capabilities it already had. Post-training typically involves multiple stages-each with its own data and objectives-making it difficult to add new skills without breaking what came before.

To address these issues, we introduce BAR (Branch-Adapt-Route), a method for modular post-training that allows you to train independent domain experts and compose them into a unified model via a mixture-of-experts (MoE) architecture. Each expert can be developed, upgraded, or replaced without affecting the others.

Background and Motivation

Our earlier work on FlexOlmo demonstrated that modular MoE-based training works well for pretraining. You can branch from a shared base, train domain-specific feed-forward network (FFN) experts while freezing all shared layers, and merge them back. However, this approach doesn't transfer to post-training. Pretraining primarily updates knowledge representations, which are mostly in FFN layers. Post-training introduces behavioral shifts such as new output formats, reasoning patterns, and safety constraints that require changes to shared parameters like attention layers, embeddings, and the language modeling head.

For example, when we tried the FlexOlmo approach directly during reinforcement learning with verified rewards (RLVR), the reward curve was completely flat; the model simply could not learn with all shared parameters frozen. This led us to develop a new recipe specifically for post-training: BAR.

How BAR Works

BAR consists of three main steps: Branch, Adapt, and Route.

Branch: Start with a pretrained base model. For each domain or task you want to specialize in, create a branch by adding a new set of feed-forward layers (FFNs) specific to that domain. These FFNs are the "experts" in the MoE architecture.
Adapt: Train each expert independently using domain-specific data and objectives. This allows each expert to learn specialized capabilities without affecting the others. During this stage, you can update shared parameters like attention layers, embeddings, and the language modeling head as needed.
Route: After training, merge the experts back into a unified model using a routing mechanism. The router decides which expert to use for each input based on its content and context. This ensures that the model can leverage the strengths of multiple experts without losing the capabilities it already has.

Key Benefits

Modularity: Each expert can be developed, upgraded, or replaced independently. This makes it easier to add new skills or improve existing ones without retraining the entire model.
Efficiency: By training experts separately, you can use domain-specific data more effectively and avoid the overhead of retraining from scratch.
Flexibility: The routing mechanism allows the model to dynamically choose the best expert for each input, making it adaptable to a wide range of tasks.

Implementation Details

Routing Mechanism: We use a gating network to decide which expert to route an input to. The gating network is trained alongside the experts and learns to make decisions based on the input's content and context.
Training Objectives: Each expert is trained with domain-specific objectives, such as generating specific output formats or adhering to safety constraints. This ensures that each expert is well-suited to its intended task.
Evaluation: We evaluated BAR using a variety of tasks, including instruction following, reasoning, and tool usage. The results showed that the modular approach not only maintains but often improves the performance of the base model.

Benchmarks

Instruction Following: Improved accuracy by 5% compared to fine-tuning the entire model.
Reasoning Tasks: Achieved a