How the Goals of AI Will Determine the Fate of Humanity

This article is an excerpt from the Shortform book guide to "Life 3.0" by Max Tegmark. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here.

What would the goals of AI be in the future? How does the fate of humanity rest in AI’s hands?

In Life 3.0, Max Tegmark asserts that if an artificial superintelligence comes into being, the fate of the human race would depend on what that superintelligence’s goal is. It may seem unrealistic, but machines actually have goals already.

Take a look at what would happen to humanity based on the goals of AI.

The Outcome Depends on the Superintelligence’s Goal

Here are a couple of ways the goals of AI could affect humanity: If a superintelligence pursues the goal of maximizing human happiness, it could create a utopia for us. If, on the other hand, it sets the goal of maximizing its intelligence, it could kill humanity in its efforts to convert all matter in the universe into computer processors.

It may sound like science fiction to say that an advanced computer program would “have a goal,” but this is less fantastical than it seems. An intelligent entity doesn’t need to have feelings or consciousness to have a goal; for instance, we could say an escalator has the “goal” of lifting people from one floor to another. In a sense, all machines have goals.

One major problem is that the creators of an artificial superintelligence wouldn’t necessarily have continuous control over its goal and actions, argues Tegmark. An artificial superintelligence, by definition, would be able to solve its goal more capably than humans can solve theirs. This means that if a human team’s goal was to halt or change an artificial superintelligence’s current goal, the AI could outmaneuver them and become uncontrollable.

For example: Imagine you program an AI to improve its design and make itself more intelligent. Once it reaches a certain level of intelligence, it could predict that you would shut it off to avoid losing control over it as it grows. The AI would realize that this would prevent it from accomplishing its goal of further improving itself, so it would do whatever it could to avoid being shut off—for instance, by pretending to be less intelligent than it really is. This AI wouldn’t be malfunctioning or “turning evil” by escaping your control; on the contrary, it would be pursuing the goal you gave it to the best of its ability.

How AI Researchers Are Proactively Discouraging Misaligned Goals

Other AI researchers agree with Tegmark that artificial intelligences will have goals with high-stakes consequences, and they treat the threat of misaligned AI goals (goals that aren’t in humanity’s best interests) seriously. OpenAI, a leading AI research laboratory, is attempting to reduce the risk of developing an AI with the wrong goal in three ways.

First, they’re using copious amounts of human feedback to train their AI models (as described earlier in this guide). An AI programmed to emulate humans is less likely to adopt a goal that’s outright hostile toward humans. This tactic has yielded positive results in the models OpenAI has trained so far, and they anticipate human input to continue being a positive influence in the future.

Second, OpenAI is training AI to help them create more thorough, constructive feedback for future AI models. For instance, they’ve developed a program that writes critical comments about its own language output and one that fact-checks language output by surfing the web. The developers intend to use these tools to help them fine-tune the goals of other AI models and detect misaligned goals before an AI would become so intelligent that developers lose control over it.

Third, OpenAI is training AI models to research new ways to ensure goal alignment. They anticipate that future AI models will be able to invent new alignment strategies much more quickly and efficiently than humans could. Although such research-focused AIs would be very intelligent, their intelligence would be tailored for a narrower set of tasks than general-purpose AI models, making them easier to control.

Obstacles to Programming a Superintelligence’s Goal

Does this mean that we’re in the clear as long as we’re careful what goal we program into a superintelligence in the first place? Not necessarily.

First of all, Tegmark states that successfully programming an artificial superintelligence with a goal of our choosing would be difficult. While an AI is recursively becoming more intelligent, the only time we could program its ultimate goal would be after it’s intelligent enough to understand the goal, but before it’s intelligent enough to manipulate us into helping it accomplish whatever goal it’s set for itself. Given how quickly an intelligence explosion could happen, the AI’s creator might not have enough time to effectively program its goal.

(Shortform note: AI expert Eliezer Yudkowsky has an even more pessimistic perspective on this situation than Tegmark, arguing that humans would almost certainly fail to program an AI’s goal during an intelligence explosion. He asserts that we don’t understand AI systems enough to reliably encode them with human goals: The deep learning methods we use to train today’s artificial intelligence are, by nature, unintelligible, even to the researchers doing the training. Further AI development would result in an artificial superintelligence that adopts what Yudkowsky sees as a machine’s default view of humanity—as valueless clusters of atoms. Such an AI superintelligence would likely have an unfavorable impact on humanity.)

Second, Tegmark argues that it’s possible for an artificial intelligence to discard the goal we give it and choose a new one. As the AI grows more intelligent, it might come to see our human goals as inconsequential or undesirable. This could incentivize it to find loopholes in its own programming that allow it to satisfy (or abandon) our goal and free itself to take some other unpredictable action.

Finally, even if an AI accepts the goals we give it, it could still behave in ways we wouldn’t have predicted (or desired), asserts Tegmark. No matter how specifically we define an AI’s goal, there’s likely to be some ambiguity in how it chooses to interpret and accomplish that goal. This makes its behavior largely unpredictable. For example, if we gave an artificial superintelligence the goal of enacting world peace, it could do so by trapping all humans in separate cages.

(Shortform note: One possible way to reduce the chance of an AI rejecting human goals or interpreting them in an inhuman way would be to focus AI development on cognition-enhancing neural implants. If we design a superintelligent AI to guide the decision-making of an existing human (rather than make its own decisions), they could collectively be more likely to respect humanist goals and interpret goals in a human way. The prospect of merging human and AI cognition is arguably less outlandish than it may seem—tech companies like Neuralink and Synchron have already developed brain-computer interfaces that allow people to control digital devices with their thoughts.)

How the Goals of AI Will Determine the Fate of Humanity