
Share
Researchers at Anthropic uncover how base AI models transform into chat models through detailed analysis, shedding light on behavioral shifts and offering tools to measure these changes accurately.
When it comes to AI models, understanding the nuances between base models and their fine-tuned chat versions is crucial. A recent study by researchers at Anthropic delves into this topic, revealing some fascinating insights about how these models differ and what techniques can help us capture those differences more effectively.
Base models are typically large language models (LLMs) trained on diverse datasets to generate coherent text. Chat models, on the other hand, are often fine-tuned versions of base models designed for interactive conversations. The goal is to make these chat models more contextually aware and user-friendly. However, identifying the specific changes that occur during this fine-tuning process can be challenging.
One common approach to understanding model differences is through latent space analysis. Latent spaces are lower-dimensional representations of the model's internal state, which can help us identify key features and behaviors. However, researchers found that most latents identified as "model-specific" were not actually unique to a particular model.
To address this issue, the researchers introduced a new technique called BatchTopK Crosscoders. This method involves:

While BatchTopK Crosscoders are effective, the researchers also explored simpler techniques. They found that:
The study introduced a variant of SAEs called diff-SAEs, which are particularly good at capturing the subtle differences between base and chat models. Key points include:
Understanding the differences between base and chat models is essential for improving AI systems. Techniques like BatchTopK Crosscoders and diff-SAEs provide valuable tools for this task, helping us identify and leverage the unique characteristics of each model. As we continue to refine these methods, we can expect more precise and effective fine-tuning processes, leading to better-performing chat models.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
2 July 2025
88 articles
Related Articles
Related Articles
More Stories