The Research Assistant That Confabulates
There’s a version of AI-assisted research that works remarkably well, and a version that leads researchers quietly astray. The gap between them is mostly about knowing what large language models are actually doing when they answer a question — which is not what most people assume.
When you ask an AI system a research question, it doesn’t search the web (unless it has a specific web search tool), consult a database, or look anything up. It generates a response based on patterns in its training data — text it processed before a cutoff date. This means it can produce confident, fluent, detailed answers about topics it doesn’t actually “know” in any verifiable sense, because the structure of fluent confident prose doesn’t require the facts underlying it to be accurate.
Hallucination Is Not a Bug, It’s a Feature of the Architecture
AI hallucination — generating plausible-sounding but factually incorrect information — is not a failure mode that will be fixed in the next version. It’s a consequence of how language models work. They generate the most probable continuation of a sequence of tokens. Sometimes the most probable continuation is accurate; sometimes it’s wrong; sometimes it’s a completely fabricated citation with a real author’s name attached to a paper that doesn’t exist.
In research contexts, hallucinated citations are particularly dangerous because they look like everything a real citation should look like: author name, journal title, year, volume, pages. They pass a cursory inspection. The problem is only revealed when you try to retrieve the paper and it doesn’t exist. This has caused real embarrassment for academics and legal professionals who submitted AI-generated citations without verifying them.
Where AI Research Assistance Actually Shines
Despite these limitations, AI tools are genuinely useful for several research-adjacent tasks. Literature exploration — asking an AI to describe the major debates in a field, the key theoretical frameworks, or the general state of evidence on a question — is often accurate and saves significant orientation time, provided you verify the details independently. AI models tend to be more reliable on prominent topics with extensive training data than on narrow, recent, or contested areas.
Synthesis and summarisation of sources you’ve already read and verified are where AI assistants add the most reliable value. If you feed the AI the text of three papers and ask it to compare their methodologies, it’s working with the actual text you provided rather than its training data — this is more reliable because the model is doing analysis rather than recall.
A Practical Verification Protocol
Any factual claim from an AI research assistant that matters should be independently verified before it enters your work. This sounds obvious and is consistently ignored. The specific things that require verification: all citations (check they exist and say what the AI claims they say), all statistics and numerical claims, any claims about specific people’s views or statements, and any claim on which the AI seems unusually confident.
Unusual confidence is a paradoxical red flag with AI. Human experts often hedge and qualify. AI models produce confident prose regardless of whether the underlying claim is solid, recent, or entirely fabricated. When an AI gives you a very specific answer to a narrow question — a precise number, a specific quote, a detailed description of an obscure paper — treat that specificity as a reason for more verification, not less.
The Right Mental Model for AI-Assisted Research
Think of an AI research assistant as a very well-read colleague who has an excellent memory for broad patterns and general knowledge, but who sometimes misremembers specific details, occasionally invents plausible-sounding facts, and whose knowledge has a cutoff date. You’d find that colleague useful for getting oriented in a new area, for brainstorming and exploring angles, and for initial synthesis. You wouldn’t cite them directly in a paper without checking their sources. The same discipline that you’d apply to a human research collaborator should apply to AI tools — perhaps more strictly, because the AI is better at sounding authoritative.
Watch: Related Video
Sources
- Bender, E. M., et al. (2021). On the Dangers of Stochastic Parrots. FAccT 2021.
- Marcus, G., and Davis, E. (2019). Rebooting AI. Pantheon Books.
- Maynez, J., et al. (2020). On Faithfulness and Factuality in Abstractive Summarization. ACL 2020.