What It Means to Think
When GPT-4 produces an answer that appears insightful, or Claude works through a complex problem with apparent care, a natural question arises: is there something it’s like to be that model doing that reasoning? Is there actual thinking happening, or an extraordinarily sophisticated pattern-matching process that resembles thought without instantiating it? The honest answer is that nobody knows, including the people who built these systems.
What we can describe is the mechanism — and that mechanism is genuinely strange. Understanding it doesn’t answer the philosophical questions, but it does make the capabilities and limitations of these systems considerably less mysterious.
The Basic Architecture: Attention and Prediction
Large language models are built on the transformer architecture, introduced in Google’s 2017 “Attention Is All You Need” paper. The core innovation was the attention mechanism: a way for the model to weigh how relevant each word in an input is to every other word, dynamically and in parallel. This allows the model to track relationships across long distances in text — understanding that a pronoun in a long sentence refers to something mentioned paragraphs earlier.
Training involves showing the model enormous amounts of text and asking it to predict what comes next. Through billions of iterations, the model’s billions of parameters are adjusted to make better predictions. The result, counterintuitively, is a system that seems to understand language, answer questions, write code, and reason through problems — none of which was explicitly trained.
Emergent Capabilities: The Surprising Part
One of the most striking discoveries in large language model research is the phenomenon of emergent capabilities: abilities that appear suddenly at certain scales, without having been explicitly trained. Models below a certain size cannot do multi-step arithmetic. Above a threshold, they can. The same pattern has been observed with analogical reasoning, translation, and various forms of problem-solving.
This is puzzling and somewhat alarming. It means that developers cannot always predict what capabilities a more powerful model will have — they discover them after training. It also means that understanding why these capabilities emerge is an open research problem. The models are not programmed to reason; reasoning appears to be a consequence of scale and the structure of language itself.
What LLMs Are Actually Bad At
Despite impressive capabilities, large language models have characteristic failure modes that illuminate how they work. They hallucinate — generating plausible-sounding but factually incorrect information — because they are fundamentally text predictors, not truth-finders. A model trained to produce coherent text will produce coherent text, whether or not it happens to be accurate.
They struggle with precise counting, consistent logical reasoning across very long contexts, and tasks that require maintaining state across many steps. They are sensitive to prompt framing in ways that genuine understanding would not be — the same question phrased differently can get opposite answers. These failures suggest that whatever is happening inside these models, it is not the same as human cognition.
The Interpretability Problem
One of the deepest challenges in AI safety research is interpretability: figuring out what is actually happening inside a large language model when it produces an output. The weights of a model like GPT-4 represent billions of numbers distributed across hundreds of layers. Researchers can observe inputs and outputs, but the internal computation is largely opaque — not by design but by complexity.
Recent work in mechanistic interpretability — the attempt to reverse-engineer what specific circuits inside models are doing — has made progress on small, well-understood tasks. But the gap between those findings and understanding what a frontier model is doing when it writes a poem or solves a math problem remains vast. We have built systems we cannot fully understand, which is itself a fact worth taking seriously.
The Right Mental Model
The most useful framing for working with large language models might be: treat the output as probabilistic rather than authoritative; verify claims that matter; use the model’s capabilities while understanding its failure modes; and resist the temptation to anthropomorphise. The systems are genuinely impressive, and they are also genuinely different from human minds in ways that matter for how we should use and regulate them. Whether they “think” in any philosophically meaningful sense remains open. What’s clear is that they do something — something powerful, something strange, and something we’re still working to understand.
Watch: Related Video
Sources
- Vaswani, A., et al. (2017). Attention Is All You Need. arXiv:1706.03762.
- Wei, J., et al. (2022). Emergent Abilities of Large Language Models. arXiv:2206.07682.
- Bender, E. M., et al. (2021). On the Dangers of Stochastic Parrots. FAccT 2021.