LLMs Fail to Model Chess and Image Blending: Why They Aren’t World Models

Models & Research

The Engineer

11 Aug 2025 · 3 min read

Despite their linguistic prowess, large language models struggle with complex tasks like chess and image blending, revealing limitations in their ability to grasp the intricacies of the physical world beyond text.

LLMs Fail to Model Chess and Image Blending: Why They Aren’t World Models

August 10th, 2025

Language models (LLMs) have made significant strides in natural language processing, but they fall short when it comes to understanding the world in a deeper, more structured way. This article explores why LLMs, despite their vast training data and impressive capabilities, are not true "world models."

Chess: A Case Study

A friend who is better at chess than me-and knows more math and CS-played some moves against a newly released LLM and was convinced it must be at least as good as him. I disagreed, confident that the LLM would falter. To test this, I made a few weak moves, which the LLM hadn't seen in its training data. Unlike my friend's opening book moves, which the LLM has likely seen millions of times, my weak moves quickly exposed its limitations.

Move 10: The LLM tried to move a knight that wasn’t on the board.
Move 9 (in a recent test): It lost track of the board state entirely.

This failure is not due to a lack of exposure to chess games. After all, these models have read trillions of chess games. However, they haven't learned the fundamental rule that legal moves require knowing where the pieces are on the board. This is because LLMs are optimized for predicting text and commentary in chess games, not for understanding the game's mechanics.

Why LLMs Aren’t World Models

When I say LLMs have no world model, I'm not referring to their lack of physical interaction with the world or their inability to recognize chess knights. Instead, it means that despite reading trillions of chess games, LLMs specifically have not learned the basic rules of chess. This is a critical distinction.

Optimization Goal: LLMs are trained to predict the next word in a sequence, which doesn't necessarily require understanding the underlying structure or rules of the content they generate.
Accidental Learning: Any knowledge about chess that an LLM acquires is purely accidental and not a result of deliberate training for this purpose.

Image Blending: Another Example

To further illustrate this point, consider the "normal blending mode" in image editors like Krita. This feature blends two layers with partially transparent pixels using a specific mathematical formula. When asked to explain this formula, an LLM provided a vague and incorrect response.

Expected Knowledge: Understanding how to blend layers is a straightforward task that can be easily described mathematically.
LLM's Response: The model’s answer was imprecise and lacked the necessary detail, indicating a fundamental misunderstanding of the concept.

Implications for General Intelligence

The examples of chess and image blending highlight a broader issue: LLMs are not equipped to develop deep, structured understanding of the world. This is a significant limitation when considering their potential for general intelligence.

Text Distribution Modeling: The argument that LLMs learn about the world as a side effect of modeling text distribution is refuted by these failures.
Development Effort: While developers could potentially improve LLM performance in specific domains, the current approach to training and optimization does not prioritize this kind of structured understanding.

Conclusion

While LLMs are powerful tools for natural language processing, they fall short when it comes to developing a true world model. Their limitations in tasks like chess and image blending demonstrate that they lack the deep, structured understanding necessary for general intelligence. As we continue to explore the capabilities and limitations of these models, it's important to recognize what they can-and cannot-do.