
Share
Despite their linguistic prowess, large language models struggle with complex tasks like chess and image blending, revealing limitations in their ability to grasp the intricacies of the physical world beyond text.
August 10th, 2025
Language models (LLMs) have made significant strides in natural language processing, but they fall short when it comes to understanding the world in a deeper, more structured way. This article explores why LLMs, despite their vast training data and impressive capabilities, are not true "world models."
A friend who is better at chess than me-and knows more math and CS-played some moves against a newly released LLM and was convinced it must be at least as good as him. I disagreed, confident that the LLM would falter. To test this, I made a few weak moves, which the LLM hadn't seen in its training data. Unlike my friend's opening book moves, which the LLM has likely seen millions of times, my weak moves quickly exposed its limitations.
This failure is not due to a lack of exposure to chess games. After all, these models have read trillions of chess games. However, they haven't learned the fundamental rule that legal moves require knowing where the pieces are on the board. This is because LLMs are optimized for predicting text and commentary in chess games, not for understanding the game's mechanics.
When I say LLMs have no world model, I'm not referring to their lack of physical interaction with the world or their inability to recognize chess knights. Instead, it means that despite reading trillions of chess games, LLMs specifically have not learned the basic rules of chess. This is a critical distinction.

To further illustrate this point, consider the "normal blending mode" in image editors like Krita. This feature blends two layers with partially transparent pixels using a specific mathematical formula. When asked to explain this formula, an LLM provided a vague and incorrect response.
The examples of chess and image blending highlight a broader issue: LLMs are not equipped to develop deep, structured understanding of the world. This is a significant limitation when considering their potential for general intelligence.
While LLMs are powerful tools for natural language processing, they fall short when it comes to developing a true world model. Their limitations in tasks like chess and image blending demonstrate that they lack the deep, structured understanding necessary for general intelligence. As we continue to explore the capabilities and limitations of these models, it's important to recognize what they can-and cannot-do.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
11 August 2025
88 articles
Related Articles
Related Articles
More Stories