Why Does AI Hallucinate? How Machines Are Easily Fooled

a giraffe wearing a top hat and a bow tie raises the question Why does AI hallucinate?

This article is an excerpt from the Shortform book guide to "Rebooting AI" by Gary Marcus and Ernest Davis. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here.

What’s an AI hallucination? Why does AI hallucinate? What are the consequences of these errors?

In their book Rebooting AI, Gary Marcus and Ernest Davis explore the phenomenon of AI hallucinations. They explain why these errors occur and discuss the potential risks in critical situations such as airport control towers or self-driving cars.

Keep reading to learn about the challenges of debugging artificial neural networks.

Why AI Hallucinates

Suppose an airport computer trained to identify approaching aircraft mistakes a flight of geese for a Boeing 747. In AI development, this kind of mismatch error is referred to as a “hallucination,” and under the wrong circumstances—such as in an airport control tower—the disruptions an AI hallucination might cause range from costly to catastrophic.

Why does AI hallucinate? Davis and Marcus explain that neural networks are trained using large amounts of data. When this strategy is employed to the exclusion of every other programming tool, it’s hard to correct for a system’s dependence on statistical correlation instead of logic and reason. Because of this, neural networks can’t be debugged in the way that human-written software can, and they’re easily fooled when presented with data that don’t match what they’re trained on.

When neural networks are solely trained on input data rather than programmed by hand, it’s impossible to say exactly why the system produces a particular result from any given input. Marcus and Davis write that, when AI hallucinations occur, it’s impossible to identify where the errors take place in the maze of computations inside a neural network. This makes traditional debugging impossible, so software engineers have to “retrain” that specific error out of the system, such as by giving the airport computer’s AI thousands of photos of birds in flight that are clearly labeled as “not airplanes.” Davis and Marcus argue that this solution does nothing to fix the systemic issues that cause hallucinations in the first place.

Hallucinations and Large Language Models

Since Rebooting AI’s publication, AI hallucinations have become more widely known thanks to the public launch of ChatGPT and similar data-driven “chatbots.” These AIs, known as Large Language Models (LLMs), use large amounts of human-written content to generate original text by calculating which words and phrases are statistically most likely to follow other words and phrases. In a sense, LLMs are similar to the autocomplete feature on texting apps and word processors, with more versatility and on a grander scale. Unfortunately, they’re prone to hallucinations such as self-contradictions, falsehoods, and random nonsense.

Beyond manually fact-checking everything an LLM writes, there are several methods that use AI to catch and minimize its own hallucinations. One of these is prompt engineering, in which you guide the LLM’s output by giving it clear and specific instructions; breaking long projects into smaller, simpler tasks; and providing the AI with factual information and sources you want it to reference. Other approaches include chain of thought prompting, in which you ask the LLM to explain its reasoning, and few-shot prompting, in which you provide the LLM with examples of how you’d like its output to be structured.

Hallucinations and Big Data

AI hallucinations aren’t hard to produce, as anyone who’s used ChatGPT can attest. In many cases, AIs hallucinate when presented with information in an unusual context that’s not similar to what’s included in the system’s training data. Consider the popular YouTube video of a cat dressed as a shark riding a Roomba. No matter that the image is bizarre, a human has no difficulty identifying what they’re looking at, whereas an AI tasked with the same assignment would offer a completely wrong answer. Davis and Marcus argue that this matters when pattern recognition is used in critical situations, such as in self-driving cars. If the AI scanning the road ahead sees an unusual object in its path, the system could hallucinate with disastrous results.

(Shortform note: In the field of image recognition, especially in terms of fine-tuning AI “vision” so that it can correctly identify objects, the research is ongoing. Since, as Davis and Marcus point out, the process by which neural networks process any given piece of information is opaque, some researchers are working to determine exactly how neural networks interpret visual data and how that process differs from the way humans see. Possible methods to improve computer vision include making AI’s inner workings more transparent, including artificial models with real-world inputs, and developing ways to process images using less data than modern AIs require.)

Hallucinations illustrate a difference between human and machine cognition—we can make decisions based on minimal information, whereas machine learning requires huge datasets to function. Marcus and Davis point out that, if AI is to interact with the real world’s infinite variables and possibilities, there isn’t a big enough dataset in existence to train an AI for every situation. Since AIs don’t understand what their data mean, only how those data correlate, AIs will perpetuate and amplify human biases that are buried in their input information. There’s a further danger that AI will magnify its own hallucinations as erroneous computer-generated information becomes part of the global set of data used to train future AI.

Bad Data In, Bad Data Out

The issue that Marcus and Davis bring up about AIs propagating other AIs’ hallucinations is commonly known as model collapse. This danger grows as more AI-generated content enters the world’s marketplace of ideas, and therefore the data pool that other AIs draw from. This danger isn’t theoretical either—media outlets that rely on AI for content have already been caught generating bogus news, which then enters the data content stream feeding other AI content creators.

The problem with relying on purely human content isn’t only that much of it contains human bias, but also that bias is hard to recognize. In Biased, psychologist Jennifer Eberhardt explains that most forms of prejudice are unconscious—if we’re not aware of their existence in ourselves, how can we possibly hope to prevent them from filtering into our digital creations? To make things more difficult, bias is, at its core, a form of classification, which Davis and Marcus freely point out is what narrow AI is getting really good at doing. Parroting and projecting human bias is therefore second nature to data-driven AI.

One solution might be to develop AI that can classify data and draw conclusions with little or no training data examples, cutting out human bias and other AIs’ hallucinations altogether. Known as “less than one” or “zero-shot learning,” techniques are currently being developed to teach AI to categorize images and text that aren’t present anywhere in its training data. As of early 2024, the zero-shot approach is still under development but is showing the potential to eliminate some of the problems inherent in big data AI training techniques.

Why Does AI Hallucinate? How Machines Are Easily Fooled