Home / AI Essays / Article
AI Essays News

Seeing Like a Language Model

Dan Shipper / …
2025-10-03 34 min read
Seeing Like a Language Model
Seeing Like a Language Model

<table><tr><td><img alt="Chain of Thought" src="https://d24ovhgu8s7341.cloudfront.net/uploads/publication/logo/59/small_chain_of_thought_logo.png" /></td><td></td><td><table><tr><td>by <a href="https:...

Chain of Thought
by Dan Shipper
in Chain of Thought
Midjourney/Every illustration.

Last week I wrote that we’d be publishing a few excerpts from a book I’m writing about the worldview I’ve developed by writing, coding, and living with AI. Here’s the first piece, about the differences between the old (pre-GPT-3) worldview and the new.—Dan Shipper


When I say the word “intelligent,” you probably think of being rational. But language models show us that this assumption is wrong.

To us in the West, smarts are about being able to explicitly lay out what you know and why you know it. For us, the root of intelligence is logic and reason to ward off superstition and groupthink; it is clear and concise definitions to eradicate vague and wooly-headed thinking; it is formal theories that explain the hidden laws of the world around us—simple, falsifiable, and parsimonious yet general enough to tie together everything in the universe from atoms to asteroids.

Our romantic picture of ourselves as “rational animals,” as Aristotle said, has produced everything in the modern world—rockets, trains, medicines, computers, smartphones.

But this picture is incomplete. It contains a huge blind spot: It neglects the fundamental importance of intuition in all of intelligent behavior—intuition which is by nature ineffable; that is to say, not fully describable by rational thought or formal systems.

How to build a thinking machine

The best way to understand what I’m talking about is to imagine trying to build a thinking machine. How would you do it?

Let’s start with a task, something easy and basic that humans do every day. Maybe something like scheduling an appointment.

Let’s say we’re a busy executive who gets the following appointment request:


New request

From: Mona Leibnis

Hey,

I’m available Monday at 3 p.m., Tuesday at 4 p.m., Friday at 6 p.m.

When can you meet?


We want to build a machine to intelligently schedule an appointment. How would we go about it?

We’d probably start by giving our machine a couple of rules to follow:

  1. First, check available time slots on my calendar.
  2. Then, compare my open slots to the open slots on the invitee’s calendar.
  3. If you find one, add the appointment to the calendar.

That all seems pretty reasonable. You could definitely write a computer program to follow those rules. But there’s a problem: The rules we’ve specified so far can’t handle urgency or importance.

For example, consider a case where you desperately want to meet with someone and you’d be willing to move another appointment in order to make the time work. Now we have to introduce a new rule:

  1. If it is urgent that I meet with the invitee, you can reschedule a less urgent appointment in order to make the meeting happen sooner.

But this rule is incomplete because it introduces the concept of urgency without defining it. How do we know what’s urgent? Well, there must be some rules for that too. So we need to delineate them.

In order to measure urgency, we have to have some conception of the different people in your life—who your clients and potential clients are, and which clients are important or not.

Now things are starting to get hairy. In order to determine the relative importance of clients, we have to know about your business aims—and about which clients are likely to close, and which clients are likely to pay a lot of money, and which clients are likely to stay on for a long time. And don’t forget—which clients were introduced by an important friend whom you need to impress, so while they may not be directly responsible for a lot of revenue, they’re still a priority.

This is only a taste of the rules we’d have to define in order to build an adequate automatic scheduling system. And that’s just for dealing with calendars!

The problem that we’re finding is that it’s very hard to lay everything out explicitly—because everything is interconnected. To paraphrase the late astronomer Carl Sagan: If you wish to schedule a meeting from scratch, you must first define the universe.”

The old, Western worldview

This approach—which seemed most natural to us—is the exact one that the first generation of artificial intelligence researchers took to try to build AI.

In what is widely considered the founding document of the field of artificial intelligence, a 1955 proposal for a summer research project, John McCarthy and his colleagues outlined their vision for how to build intelligence:

“It may be speculated that a large part of human thought consists of manipulating words according to rules of reasoning and rules of conjecture. From this point of view, forming a generalization consists of admitting a new word and some rules whereby sentences containing it imply and are implied by others.”

In other words, human thinking is just arranging words according to rules, and making a new idea requires adding a new word and rules for how it links to others. This approach, called symbolic AI, aimed to create intelligent machines by encoding human knowledge into formal rules and symbols—just like we did in our scheduling example.

But it failed miserably—for the same reasons ours did. Trying to formalize even basic thinking in terms of explicit rules turned into an infinite task requiring more computing power than would be available in the lifetime of the universe. But it remained the dominant paradigm in AI research for 30-odd years. Why?

Why is this approach so appealing to us? Why was it the first approach AI researchers took? And why, even today, does it seem like such an intuitive picture of intelligence?

The answer is because this way of thinking about the world and our own minds is woven into the fabric of our culture. It is a form of thinking invented by the Ancient Greek heretic, Socrates, and which was reborn in the Enlightenment through the work of Descartes, Galileo, and Newton.

It comes from an underlying set of assumptions so basic that they are practically invisible:

  1. If we think logically and clearly enough about anything in the world, we can figure it out (like how you can solve any puzzle if you stare at it long enough).
  2. Once we figure something out, we should be able to write it down in a way that is so clear and explicit that anyone else can understand and test it for themselves (like a good recipe that anyone can follow).
  3. If something is really true, it should be true no matter where you are or who you are (gravity works the same whether you're in Tokyo or Toronto).
  4. Real truth works like math and can't contradict itself (A is A and B is B, and something can’t be both A and B at the same time).
  5. Finally, the truth is always simple, and if it’s not simple, it means it’s not yet the truth.

If you believe that the world is composed of atoms rather than spirit—it is because of this style of thinking. If you’ve ever washed your hands so that you don’t get sick—it’s because of this style of thinking. If you believe in the existence of inalienable human rights—it’s because of this style of thinking. If you’ve ever laughed when the new person you’re dating asks, unabashedly, for your exact birth time, or believed in an efficient market or that humans follow their incentives or that a seizure is a sign of a brain disorder rather than possession by the devil, or woken up to see the sun rising and felt wonder at the many thousands of miles that the Earth traveled through space as you slept—it’s because of this style of thinking.

The drive for simple, general, abstract, universal theories has been a resounding success. It’s responsible for our progress out of darkness and superstition and into modernity. It has brought about the Western world. But it also fails when it becomes totalizing—when we allow our search for these truths to blind us to the real complexity of the world, and to alternative ways of seeing it.

What we can say about the world is not all that we know about it. We know many things that we can’t say explicitly. And the drive to say everything explicitly is doomed to failure.

Seeing like a language model

But for the first time, we have a tool—neural networks—that allows us to know and work with what can’t be defined or said explicitly. Neural nets and language models don’t work by explicit rules. Instead, they absorb tacit patterns in language—statistical regularities across billions of words—and, in doing so, recreate the hidden web of associations that makes our world hang together. Out of those patterns emerges not just syntax but sense. What they give us is a way to capture the tacit, the intuitive, the unsaid parts of intelligence that our old, rule-bound worldview could never reach.

Large language models allow us to see anew about how we think and learn: not through the rigid application of universal laws, but through the recognition of patterns across vast landscapes of experience. When a chef seasons a dish, when a mother knows her child is lying, when a trader senses a market shift—these aren’t moments of mechanical calculation, but of pattern recognition so subtle and multifaceted that they resist explicit definition. Language models, in their very architecture, embody this more nuanced way of knowing. They remind us that intelligence isn't about following rules, but about dancing with context.

If we learn to see like a language model, it will lead us to an entirely new worldview—the successor to Western rationality. What follows is a sketch of this new worldview: what it is, how it contrasts with the old, and the transformations it enables.


Become a paid subscriber to Every to unlock this piece and learn about the eight ways in which the new worldview differs from the old.


Click here to read the full post

Want the full text of all articles in RSS? Become a subscriber, or learn more.

Source: Chain of Thought Word count: 10430 words
Published on 2025-10-03 08:00