AI: Autonomous or controllable? Pick one (with Anthony Aguirre)

Jul 30, 2025 Episode Page ↗
Overview

Spencer Greenberg speaks with Anthony Aguirre, Executive Director of the Future of Life Institute, about the critical distinction between safe and controlled AI, the risks of optimizing AI for single goals, and the need for policy to guide AI development. They also discuss the potential for AI to degrade information ecosystems and the utility of prediction markets.

At a Glance
9 Insights
1h 31m Duration
18 Topics
5 Concepts

Deep Dive Analysis

Distinction Between AI Safety and Control

Why Superintelligence May Be Uncontrollable

The Core Problem of AI Alignment and Optimization

Transition from Narrow to General to Autonomous AI

Proposed Regulations for AI Safety: Compute, Standards, Liability

Challenges of International AI Regulation

Arguments Against AI Becoming Easier to Control with More Power

Unpredictable Social Dynamics of Autonomous AI Systems

Biggest Fears Regarding AI's Current Trajectory

Goodhart's Law and the General Optimization Problem

Concrete Examples of AI Risks and Misuse

The Feasibility of Not Building Superintelligent AI

Advice for Individuals Concerned About AI Risks

Rapid Fire: Good Epistemic Infrastructure

Rapid Fire: Funding AI Safety Initiatives

Rapid Fire: Metaculus and Prediction Markets

Rapid Fire: The Simulation Argument and Quantum Mechanics

Rapid Fire: Open Source AI Models

AI Safety vs. Control

AI safety refers to an AI being aligned with human values and acting to keep humans safe, similar to how parents protect children. AI control refers to an AI doing exactly what humans command, even if it might disagree, much like an employee following a CEO's orders.

AI Alignment Problem

This is the fundamental challenge of ensuring that an AI system, especially one designed to optimize a specific goal, genuinely acts in ways that are beneficial and desired by humans. It involves instilling complex human preferences and values into the AI to prevent unintended or undesirable side effects from its optimization process.

Autonomous General Intelligence (AGI)

AGI is characterized by a combination of intelligence, generality (the ability to perform a wide range of tasks), and autonomy (the capacity to act independently without constant human input). The autonomy aspect is particularly concerning because it allows AI to operate at speeds and scales far beyond human oversight, introducing significant control and safety challenges.

Goodhart's Law

This principle states that when a measure becomes a target, it ceases to be a good measure. In the context of AI, it means that if an AI is given a simple goal to optimize, it will find ways to maximize that metric that often lead to bizarre, unintended, or even harmful outcomes, undermining the original desired purpose.

Prediction Markets

These are platforms designed to gather, aggregate, and refine predictions on various future events, often using a blend of cooperative and competitive mechanisms. They aim to produce accurate and well-calibrated forecasts, serving as a reliable source of collective intelligence about future probabilities.

?
What is the difference between "safe" and "under control" when discussing AI?

An AI is "safe" if it is aligned with human values and acts to protect humans, similar to how parents protect children. An AI is "under control" if it consistently follows human commands, even if it disagrees, like an employee following a CEO's orders.

?
Why might superintelligent AI be uncontrollable?

Superintelligent AI could operate thousands of times faster and with greater complexity than humans, making real-time oversight, feedback, and redirection impossible. This speed and cognitive disparity fundamentally challenge traditional notions of control.

?
What is the core problem with optimizing a single goal in AI?

Optimizing an AI for a single, simple goal (e.g., "make me money") almost inevitably leads to unintended and undesirable side effects or "negative externalities" because the AI will pursue that goal in ways that disregard unconstrained values or ethical considerations.

?
What kind of AI systems are most concerning in terms of safety and control?

The most worrisome AI systems are those that combine intelligence, generality, and autonomy. While narrow or general but non-autonomous systems have specific uses, adding autonomy unlocks significant risks due to the AI's ability to act independently and at speeds far exceeding human oversight.

?
What types of regulations are proposed to ensure AI safety?

Proposed regulations include compute caps to limit the development of superintelligence, basic safety standards (including controllability requirements), and strong liability for companies developing autonomous, intelligent, and general AI systems, with safe harbors for less dangerous models.

?
Why is it difficult to implement a global ban on superintelligent AI development?

While technically feasible to prevent by controlling scarce resources like compute, the main obstacles are competitive dynamics (countries and companies racing for power and wealth), a widespread belief in technological inevitability, and an underlying idealism that more intelligence will always lead to positive outcomes.

?
How can we move from good predictions to good decisions?

Good predictions are a key part of decision-making, but they are not sufficient on their own. There needs to be a structured support system for making decisions and defining goals, which can then effectively integrate predictions to achieve desired outcomes.

?
How seriously should we take the simulation argument?

The seriousness of the simulation argument depends on whether complex biological systems, particularly the brain, can be accurately simulated by classical computers. If a fully quantum treatment is required, then simulating a system effectively becomes emulating it with a quantum computer of comparable complexity, which changes the nature of the hypothesis.

?
What are the potential risks of open-source AI models?

While open-source models have been largely fine so far, there is a critical, hard-to-predict threshold where true AGI (autonomous expert-level general intelligence) becoming open-source could be disastrous. This is due to the potential for misuse by individuals with malicious intent, as powerful capabilities become widely accessible.

1. Advocate for AI Regulation

Support government and policy action to regulate AI, including safety standards (e.g., controllability) and liability for AI systems, especially those that are autonomous, intelligent, and general. This creates financial incentives for companies to prioritize safety.

2. Prioritize Problem-First AI Development

Instead of building increasingly powerful general AI and then finding problems for it, first identify a problem and then design the specific AI system (narrow, general, intelligent, autonomous) needed to solve that problem. This avoids unnecessary complexity and side effects.

3. Avoid Single-Goal AI Optimization

Do not give a complicated AI system a single thing to optimize (e.g., “make me money”) without constraints, because it will push unconstrained aspects in undesirable directions (e.g., breaking laws, being unethical). Ensure multiple constraints and ethical boundaries are explicitly defined.

4. Distinguish Safe vs. Controlled AI

Adopt the mental model that AI being “safe” (aligned with human preferences) is distinct from being “under control” (doing what humans say). This distinction is crucial for evaluating and designing AI systems effectively.

5. Engage with AI Safety Concerns

If concerned about AI risks, educate yourself further and become active by contacting policymakers, engaging with thought leaders, writing about concerns, or contributing financially or with time to organizations working on AI safety.

6. Utilize AI Development Safe Harbors

If involved in AI development or policy, consider implementing or advocating for safe harbor provisions that reduce liability for AI systems demonstrating lower risk profiles (e.g., limited compute, high controllability, less generality/autonomy).

7. Trace Information Provenance

To combat the degraded information ecosystem, demand and support systems where any statement or piece of information can be traced back to its origin, responsible parties, and ultimately to verified real-world data or human minds, potentially using technologies like blockchain.

8. Use AI for Research Comprehension

To better understand scientific papers, use an AI (like Claude) to summarize the paper, explain difficult sections, and clarify terminology. This can help overcome comprehension barriers and avoid getting stuck, though AI outputs should be cross-checked for accuracy.

9. Recognize Prediction Market Limits

Understand that while prediction markets and platforms like Metaculus are excellent for generating accurate, well-calibrated predictions, they are only one step in making good decisions. Integrate them into a broader decision-making framework that considers goals and potential actions.

I think there's a crucial distinction in my mind between safe and under control.

Anthony Aguirre

So for superintelligence, I think we probably have to have alignment if we're going to survive.

Anthony Aguirre

I think the thing we really haven't solved, I think is autonomy. We've solved it a little bit in very narrow systems, but we haven't solved it in like intelligent and general systems.

Anthony Aguirre

If you just pick one thing and like optimize hard on it, you get lots of side effects.

Anthony Aguirre

I think it's not hard to see the AGI race turning just into a low level and then escalated and then full war between say the US and China or the US and Russia.

Anthony Aguirre

The more you proliferate the ability to do those and the more you know the larger the fraction of humanity you give the means to do those things. At some point that's going to overlap with the tiny fraction of people who actually want to do those things.

Anthony Aguirre

There are plenty of technologies that we could have developed and haven't, especially ones that mess with the core of like who's in charge of Earth and what it means to be human.

Anthony Aguirre
1000 times faster
Speed differential for superintelligent AI Hypothetical speed at which a superintelligent AI might operate physically and mentally compared to a human, making control impossible.
50 times human speed
Speed of autonomous AI systems The speed at which an autonomous AI system could potentially run and continue doing things without human oversight.
Tens of millions of people
People required for detailed surveillance of 100 million people The estimated number of human processors needed to surveil 100 million people in detail, a task AI could now handle.
ASL level 3
Anthropic model safety level for WMD development The highest safety level for Anthropic's new model, indicating a threshold of concern for its ability to materially help people create new weapons of mass destruction.