The Next Bitter Lesson

Apr 15

The first bitter lesson was that human knowledge does not scale. The next bitter lesson may be that brute force does not scale either. The future belongs to systems that learn structure from experience.

In 2019, Rich Sutton, one of the foundational figures in reinforcement learning, published a short essay called The Bitter Lesson.¹ His argument is simple and uncomfortable. Across seventy years of AI research, every approach that tried to encode human knowledge into systems, whether chess strategy, phoneme structure in language, or edge detection in vision, eventually lost to methods that simply scaled computation. Search and learning, given enough compute, beat structured rules every time. The lesson is bitter because the careful, intelligent, domain-specific work researchers invested their careers in was not just unhelpful but actively in the way. We built in what we had discovered, and that was exactly what held the systems back.

It is a good essay. Read it if you have not.

. . .

The field of AI has largely absorbed this lesson. We stopped trying to encode knowledge and turned instead to what Sutton prescribed: scale. Bigger models, more data, more parameters, more compute thrown at every problem. Rather than teaching a system the rules of chess, you give it enough compute to evaluate millions of positions. Rather than explaining grammar, you train on enough text that structure emerges. Force the problem with raw computation and let the results speak for themselves. This is the brute force method, and it works spectacularly. However, we may be in the early stages of learning our next bitter lesson.

The physical limits of brute force are starting to show up. Across the industry, what began as generous flat-rate access is being replaced by hard limits and steep bills. Google now publishes explicit daily prompt caps for Gemini: free users get five prompts a day on the 2.5 Pro model; paying $250 a month gets you five hundred. Cursor and Replit both revised their pricing in 2025 to cap power users who were consuming disproportionate infrastructure. In April 2026, Anthropic blocked Claude Pro and Max subscribers from using their flat-rate plans to power third-party agent frameworks like OpenClaw. The gap between what those users paid and what their usage actually cost at API rates was reportedly more than five times. Anthropic’s email to affected users was plain: these tools put an outsized strain on our systems. Users now pay per token, or they don’t run agents. The compute is not, as it turns out, included.²

On this note, it is worth highlighting that the general public seems to have a problem with data centers using all the water and all the power. Community opposition blocked or delayed $98 billion in data center projects between March and June 2025 alone, a coalition of 230 environmental groups has called for a moratorium on new construction, and in Newton County, Georgia, a single Meta data center is projected to create a water deficit by 2030. There is, at the very least, a physical ceiling to compute.

Hardware improvements and algorithmic efficiency gains will extend the runway. But the direction is established, and history suggests the response to that kind of pressure tends to be a new paradigm.

. . .

Here is where it gets interesting to push on Sutton’s argument. He claims that the actual contents of minds are irredeemably complex, and that we should stop trying to find simple ways to think about them. Fair enough. But there is an embedded assumption worth examining. It is surely true that the complexities of minds are beyond our reach. But does that mean they are beyond reach entirely? We have been vain enough to think we could encode intelligence, and when we failed, our hubris led us to a familiar conclusion: if we can’t do it, it can’t be done. The bitter lesson has always arrived as a correction to exactly this kind of thinking. Our limitations are not universal limits. They are just ours.

Sutton himself points to experiential learning as the path forward. He is not arguing against structure. He is arguing against our structure. The next bitter lesson may arrive when AI systems, pressed against the limits of brute force and the rising cost of computation, need to be prompted to begin abstracting from experience into reusable internal structure rather than recomputing from scratch every time. The move would be from general to structured, but the structures would not be ours.

. . .

There is already early evidence that something like this is beginning to happen.

Mechanistic interpretability, the effort to reverse-engineer what neural networks are actually computing internally, has found structure nobody designed. Anthropic’s attribution graph research found that during multi-hop reasoning tasks, models form intermediate representations before arriving at conclusions, working through a chain of associated concepts rather than jumping from input to output. When generating poetry, the model selects rhyming candidates before composing lines. In medical reasoning tasks, models generate internal diagnoses that guide the follow-up questions they produce.³ Researchers studying transformers trained on modular arithmetic found that the models had independently developed Fourier representations to solve the task. Nobody specified this. The models arrived at a known mathematical structure purely under the pressure of training.⁴ Protein language models trained on amino acid sequences developed internal features corresponding to known biological structures, including motifs absent from their training annotations but confirmed in independent databases. The models found patterns in molecular data that human biologists had not yet catalogued.⁵

This is not the full picture. Mechanistic interpretability is still early, and its findings are easier to describe than to truly understand. But what the evidence points to is not a return to hand-coded rules. These systems are developing their own internal representations: compressed, learned, and in many cases more nuanced than any structure a human researcher could have specified. Attribution graphs reveal computational pathways; the grokking results reveal learned algorithms. The claim here is not that neural networks are rediscovering symbolic AI. The claim is that the pressure of training is producing emergent internal organization that serves the same functional role that handwritten heuristics once aspired to, except that it is discovered rather than designed, and it operates at a level of complexity we may not be able to fully articulate.

Once that kind of structure is learned, it does not need to be recomputed from scratch each time. The result is a dramatic reduction in compute requirements.

The human brain running on 20 watts is the obvious comparison. It does not simulate the physics of a staircase before descending one. It applies experience-derived compressions that produce good-enough answers quickly, reserving deeper computation for the genuinely novel.

The symbolic AI researchers of the 1970s and 80s had the right intuition about what efficiency looks like. Their error was in the source. They tried to write the structure themselves. The better path is systems that learn it, at a level of complexity and in a form we may not be able to follow.

. . .

The deepest lesson running beneath all the technical ones is simpler and harder than any of them: you must always be willing to kill your darlings. Not just your favourite architectures, but your conceptual ones. The belief that you understand what intelligence is. The belief that you know where the ceiling is. The belief that the approach that carried you this far is the one that carries you further. Adaptability is the only real strength, and being wrong is the most essential part of the human experience. We will be wrong again; the only question is how long it takes us to notice.

References

1. Sutton, R. The Bitter Lesson. March 13, 2019. incompleteideas.net/IncIdeas/BitterLesson.html

2. TNW: Anthropic blocks OpenClaw from Claude subscriptions in cost crackdown. April 2026. thenextweb.com/news/anthropic-openclaw-claude-subscription-ban-cost — PYMNTS: Claude Users Hit a New Reality of AI Rationing. March 2026. pymnts.com/artificial-intelligence-2/2026/ai-usage-limits-are-becoming-the-new-reality-for-consumers

3. Anthropic Interpretability Team. On the Biology of a Large Language Model. February 2025. transformer-circuits.pub/2025/attribution-graphs/biology.html

4. Nanda, N. et al. Progress measures for grokking via mechanistic interpretability. 2023. arxiv.org/abs/2301.05217

5. Simon, E. & Zou, J. InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders. bioRxiv, November 2024. Published in Nature Methods, 2025. arxiv.org/abs/2412.12101 — DOI: 10.1101/2024.11.14.623630

Simone VanKerrebroeck

The Next Bitter Lesson

AI: The Narcissist's Best Friend