
Share
The debate over whether chain-of-thought reasoning in large language models is genuine or just an illusion of memorization continues to spark heated discussions among researchers, with a new paper adding fuel to the fire.
If you’ve been keeping up with the latest research on chain-of-thought (CoT) reasoning in large language models (LLMs), you might have noticed a recurring debate: whether CoT is "real" reasoning. This question has become a bit of a pet peeve for me, especially after reading papers that seem more interested in philosophical musings than practical insights. One such paper, "Is Chain-of-Thought Reasoning of LLMs a Mirage?" from Arizona State University, has recently gained attention. Let's dive into what it argues and why it might not be the most compelling contribution to the field.
The core argument of the paper is that CoT reasoning in LLMs works well for in-distribution or near-in-distribution data but becomes fragile and prone to failure under moderate distribution shifts. The authors suggest that what appears to be structured reasoning might just be a mirage, emerging from memorized or interpolated patterns in the training data rather than genuine logical inference.
To test this hypothesis, the researchers trained a small transformer model (about 600k parameters) on a corpus of non-language data transformations. Here's a breakdown of their approach:
A B C D [M1] [M1]
<think>
B C D E [M1]
</think>
C D E F
This setup allowed the researchers to easily generate and check thousands of completions for errors in reasoning.
The paper draws several conclusions from their experiments:

While these findings are interesting, they raise more questions about the nature of CoT reasoning in LLMs. The authors argue that the model's apparent reasoning is largely a result of memorization and interpolation rather than genuine logical inference. However, this conclusion might be oversimplified.
Despite its contributions, I find this paper lacking in several areas:
The Arizona State University paper offers some valuable insights into the limitations of CoT reasoning in LLMs. However, it falls short by focusing on a narrow and somewhat philosophical question rather than providing practical guidance for improving model robustness. For those working with LLMs, the key takeaway is to be aware of the potential fragility of CoT under distribution shifts and to consider methods that enhance model adaptability.
Tags
Original Sources
↗ https://www.seangoedecke.com/real-reasoning/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
15 August 2025
88 articles
Related Articles
Related Articles
More Stories