
Share
Richard Sutton’s “Bitter Lesson” warns against over-relying on human expertise in AI, yet recent advances in LLMs suggest a nuanced need for reinforcement learning to bridge the gap.
01 Oct, 2025
I recently had a chance to listen to the Dwarkesh pod with Sutton, which sparked some interesting thoughts on the state of AI research, particularly in the context of Large Language Models (LLMs) and the "Bitter Lesson."
For those unfamiliar, Richard Sutton's "The Bitter Lesson" has become a foundational text in the LLM community. The essay argues that the most effective approaches to AI are those that leverage massive amounts of computation with minimal human intervention. This idea is often summarized as being "bitter lesson pilled," meaning an approach is considered valid if it scales well with additional compute. LLMs, which have shown remarkable performance gains as they grow in size and training data, seem to embody this principle perfectly.
However, Sutton himself isn't so convinced that LLMs are truly "bitter lesson pilled." This is a bit of a twist for the community, given how much we've leaned on his ideas. Here’s why:
In the podcast, Dwarkesh, representing the LLM researcher perspective, and Sutton, a self-described "classicist," have a fascinating exchange. Sutton envisions an AI system more akin to Alan Turing’s idea of a "child machine", a system that learns through dynamic interaction with the environment rather than through static pretraining on human data.
Here are some key points from Sutton’s perspective:

Sutton’s approach is fundamentally rooted in reinforcement learning (RL), where the agent learns by interacting with its environment. This contrasts sharply with the pretraining-finetuning paradigm of LLMs. He argues that even if pretraining is seen as a form of initialization, it still introduces human biases that can limit the model's potential.
One of Sutton’s favorite examples is AlphaZero vs. AlphaGo. AlphaZero, which starts from scratch and learns through self-play, eventually outperforms AlphaGo, which initializes from human games. This suggests that starting with a clean slate and learning through interaction might be more powerful than relying on pre-existing data.
Sutton’s vision aligns more closely with the animal kingdom, where learning is continuous and driven by intrinsic motivations. He believes that if we can understand how simpler animals like squirrels learn, we would be much closer to solving AI.
While Sutton's ideas are compelling, they also raise significant practical challenges. LLMs have demonstrated impressive performance on a wide range of tasks, and their scalability with compute is hard to ignore. However, the issues of data finiteness and bias are real and need addressing. Perhaps a hybrid approach that combines elements of both worlds, leveraging large datasets while incorporating continuous learning and intrinsic motivations, could be the way forward.
In any case, Sutton’s critique serves as a valuable reminder to keep questioning our assumptions and exploring different paths in AI research.
Tags
Original Sources
↗ https://karpathy.bearblog.dev/animals-vs-ghosts/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
3 October 2025
133 articles
Related Articles
Related Articles
More Stories