
Share
Researchers uncover how large language models interpret complex visual features through SVG and ASCII art, revealing insights into LLMs' ability to understand images beyond basic text structure.
In a recent update from the Anthropic interpretability team, researchers Julius Tarng, Purvi Goel, and Isaac Kauvar explored how large language models (LLMs) perceive and understand visual features across different text-based modalities. This work builds on their earlier research that delved into the mechanisms LLMs use to process low-level visual properties of text, such as line breaking and table formatting. The team now turns their attention to higher-level semantic concepts encoded visually in text.
The researchers generated ASCII and SVG smiley faces using Claude, an LLM developed by Anthropic. They then examined the internal representations of these visual depictions within the model. Here’s a breakdown of their approach:
:-) (smiley face) and :-( (frowny face).<circle cx="50" cy="50" r="10" /> for an eye within a larger face structure.
The team found that LLMs develop cross-modal features that recognize specific concepts across different text-based modalities. Here are some key observations:
These findings provide valuable insights into the internal representations that LLMs use to process and generate text-based visual content. Here are a few implications:
This research from the Anthropic interpretability team sheds light on how LLMs process and understand visual content across different modalities. By identifying and analyzing cross-modal features, the team has provided valuable insights into the internal mechanisms of these models. Future work in this area could further explore the generalization capabilities of LLMs and their potential applications in creative and interpretive tasks.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
27 October 2025
88 articles
Related Articles
Related Articles
More Stories