
Share
The release of open-source sparse autoencoders for Llama 3.1 8B and Llama 3.3 70B enhances model interpretability, offering developers tools to explore and control language models' internal workings through Ember's new API/SDK.
Following the recent announcement of Goodfire Ember, we’re excited to release state-of-the-art, open-source sparse autoencoders (SAEs) for Llama 3.1 8B and Llama 3.3 70B. SAEs are interpreter models that help us understand how language models process and represent information internally. These models power Ember’s interpretability API/SDK and have been crucial in enabling feature discovery and programmatic control over LLM internals.
We’re releasing SAEs for:
These models build on our earlier work with Llama-3-8B, where we demonstrated the effectiveness of training an SAE on the LMSYS-Chat-1M dataset [2]. Our SAEs are designed to decompose complex neural activations into interpretable features, making it possible to understand and steer model behavior at a granular level.
Interpretable Features: The SAEs break down the internal representations of LLMs into meaningful components. This allows researchers and developers to identify specific patterns and behaviors the model has learned.
Programmatic Control: By steering these features, you can control the model's output in a more precise manner. For example, you can instruct the model to "talk like a pirate" or exhibit "melancholy" across various prompts.
Evaluation Metrics:

Parameterization Strategy: Our starting point was the Anthropic April update [3], which provided insights into effective parameter settings for transformer circuits.
Training Data: We used the LMSYS-Chat-1M dataset, a large-scale real-world conversation dataset that captures a wide range of interactions and contexts.
For practitioners, these SAEs offer several benefits:
To get started with these SAEs, you can:
The release of these open-source SAEs marks a significant step forward in the field of model interpretability and control. By providing researchers and developers with powerful tools to understand and steer LLMs, we aim to foster innovation and responsible AI development.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
13 January 2025
88 articles
Related Articles
Related Articles
More Stories