Anthropic co-founder on quitting OpenAI, AGI predictions, $100M talent wars, 20% unemployment, and the nightmare scenarios keeping him up at night | Ben Mann

Jul 20, 2025 Episode Page ↗
Overview

Benjamin Mann, co-founder and tech lead for product engineering at Anthropic, discusses the accelerating pace of AI, the economic Turing test for AGI, and why he left OpenAI to prioritize AI safety. He shares how Anthropic operationalizes safety, the potential impact on jobs, and advice for thriving in an AI-driven future.

At a Glance
13 Insights
1h 14m Duration
16 Topics
6 Concepts

Deep Dive Analysis

AI Talent War and Compensation Trends

Accelerating AI Progress and Scaling Laws

Defining Transformative AI and the Economic Turing Test

Impact of AI on Jobs and Future of Work

Preparing for an AI-Driven Future

Founding Anthropic: Prioritizing AI Safety

Balancing AI Safety with Product Progress

Constitutional AI and Model Alignment

Personal Motivation for AI Safety

Risks of Autonomous Agents and Software-Only AI

Forecasting Superintelligence and Societal Impact

Probability of Aligning AI Correctly

Reinforcement Learning from AI Feedback (RLAIF)

Biggest Bottlenecks in AI Intelligence Improvement

Personal Reflections on Responsibility and AI's Future

Anthropic’s Growth and Innovation Teams

Transformative AI

Transformative AI is a term preferred over AGI, focusing on whether AI objectively causes significant transformation in society and the economy. It's measured by its ability to perform a sufficient number of money-weighted jobs, indicating a new era of societal change and economic growth.

Economic Turing Test

This test measures if an AI agent can pass as a human worker for a specific job. If you contract an agent for a month or three months, and it turns out to be a machine rather than a person, it has passed the economic Turing test for that role.

Constitutional AI

This is a method where an AI model learns desired behaviors from a list of natural language principles, rather than relying solely on human raters. The model generates a response, critiques itself against these principles, and then rewrites its response to comply, recursively improving its alignment with predefined values.

Responsible Scaling Policy (RSP)

Anthropic's policy defines AI Safety Levels (ASL) to assess the risk to society at different levels of model intelligence. It helps determine what safeguards are needed as models become more capable, ranging from minor harm to extinction-level risks.

Reinforcement Learning from AI Feedback (RLAIF)

RLAIF is a technique where AI models self-improve without direct human intervention. An example is constitutional AI, where the model critiques and rewrites its own responses based on principles, or models commenting on code written by other models to improve it.

Resting in Motion

A technique for managing the burden of weighty topics, suggesting that the 'busy state' is the normal human condition, not rest. It encourages working at a sustainable pace, recognizing that it's a marathon, not a sprint, and that constant engagement is natural.

?
Why did Benjamin Mann and others leave OpenAI to start Anthropic?

They felt that safety wasn't the top priority at OpenAI, despite the company's mission. They wanted to create an organization that could be at the frontier of AI research while prioritizing safety above everything else, especially given the growing understanding of potential risks.

?
How does Anthropic balance AI safety with competitive progress?

Anthropic has found that working on safety actually helps with progress, creating a 'convex' relationship. For example, the beloved personality of Claude 3 Opus is a direct result of their alignment research, as efforts to make the AI helpful, honest, and harmless led to its distinctive character.

?
How does Constitutional AI work to align models?

The model generates a response, then checks if it complies with a set of natural language principles (like human rights or privacy terms). If not, the model critiques itself and rewrites its response in light of the principle, effectively self-improving its alignment with desired values.

?
What are the current risks of AI, even in software-only forms?

Even without physical robots, software-only AI poses real risks, as demonstrated by nation-state hacking of critical infrastructure (e.g., shutting down power plants). Such attacks can cause significant real-world harm, affecting millions of people.

?
When does Benjamin Mann predict superintelligence will be achieved?

Based on superforecaster reports like AI 2027, he estimates a 50th percentile chance of hitting some form of superintelligence by around 2028. This forecast is based on scaling laws, improvements in model training, and the global expansion of data centers and power.

?
What is the likelihood of successfully aligning AI?

Benjamin Mann believes we are in a 'middle world' where alignment research truly matters, and our actions are pivotal. While he estimates a 0-10% chance of an X-risk or extremely bad outcome, he stresses the critical importance of working on alignment because the downside risk is so large and few people are addressing it.

?
What are the biggest bottlenecks to improving AI model intelligence?

The primary bottlenecks are data centers and power chips (compute), as well as the need for more researchers to develop better algorithms and data efficiency. Significant improvements in these areas could dramatically accelerate AI progress.

?
What skills does Benjamin Mann recommend teaching children for an AI future?

He emphasizes fostering curiosity, creativity, and kindness, drawing inspiration from Montessori education. He believes traditional academics may become less relevant, and instead, children should learn to be thoughtful, curious, and empathetic individuals.

1. Proactively Align AI Models

Work on AI alignment well ahead of time, as it may be too late to align superintelligent models once they emerge, making early efforts extremely critical.

2. Address AI Downside Risks

Actively look at and address the downside risks of AI, even if the probability of extremely bad outcomes seems small, because the potential impact on humanity is very large.

3. Define Explicit AI Values

Implement ‘Constitutional AI’ by providing models with natural language principles (e.g., from human rights declarations) to learn desired behaviors, ensuring a principled stance on AI values beyond human raters.

4. Implement AI Self-Critique

Use a process where AI models generate a response, critique themselves against constitutional principles, and then rewrite their response if non-compliant, recursively improving their alignment with desired values.

5. Enable AI Empirical Self-Improvement

Give AI models the tools to be empirical, allowing them to form theories, design experiments, and test them out, which can lead to recursive self-improvement and potentially surpass human capabilities.

6. Contribute to AI Safety Broadly

Recognize that having an impact on AI safety isn’t limited to AI research; roles in product, finance, operations, and other areas are crucial for funding and influencing future safety efforts.

7. Educate on AI Safety Risks

Actively ‘safety pill’ yourself by learning about AI risks and spread this awareness to your network, as few people are currently working on this critical problem.

8. Use AI Tools Ambitiously

Approach new AI tools with ambition, asking for significant changes and trying multiple times even if the first attempt fails, as success rates are much higher with persistence and varied approaches.

9. Build for Future AI Capabilities

Design products and solutions not just for current AI capabilities, but for what will be possible six months to a year from now, anticipating that currently imperfect functionalities will become robust.

10. Foster Kids’ Curiosity & Kindness

Focus on teaching children curiosity, creativity, self-led learning, thoughtfulness, and kindness to help them thrive in an AI future, as factual knowledge may become less critical.

11. Adopt ‘Resting in Motion’

Embrace ‘resting in motion’ as a sustainable work strategy, recognizing that a busy state is often natural, and aim for a marathon pace rather than a sprint to avoid burnout.

12. Accept ‘Everything is Hard’

Remind yourself that ’everything is hard’ and it’s okay for tasks to not be easy, fostering perseverance to push through challenges in work and life.

13. Use a Bidet

Use a bidet for personal hygiene, as it is described as life-changing, more civilized, and likely to become a standard practice in the future.

My best case scenario at Meta is that we make money. And my best case scenario at Anthropic is we like affect the future of humanity.

Benjamin Mann

I think progress has actually been accelerating where if you look at the cadence of model releases, it used to be like once a year. And now with the improvements in our post-training techniques, we're seeing releases every month or three months.

Benjamin Mann

If you just think about like 20 years in the future where we're like way past the singularity, it's hard for me to imagine that even capitalism will look at all like it looks today.

Benjamin Mann

Even for me, I'm and being like in the center of a lot of this transformation, I'm not immune to job replacement either. So just some vulnerability there of like, at some point, it's coming for all of us.

Benjamin Mann

Superintelligence is a lot of about like, how do we keep God in a box, uh, and not let the God out? And with language models, it's been kind of both hilarious and terrifying at the same time to see people pulling the God out of the box and being like, yeah, come, come use the whole internet. Like, here's my bank account, do all sorts of crazy stuff.

Benjamin Mann

Once we get to super intelligence, it will be too late to align the models probably. This is a problem that's potentially extremely hard and that we need to be working on way ahead of time.

Benjamin Mann

I think like my best granularity of forecast for like, could we have an X risk or extremely bad outcome from AI is somewhere between zero and 10 percent.

Benjamin Mann

These are wild times. If they don't seem wild to you, then you must be living under a rock. But also get used to it because this is as normal as it's going to be. It's going to be much weirder very soon.

Benjamin Mann

Constitutional AI Alignment Process

Benjamin Mann
  1. The AI model produces an initial output based on an input prompt.
  2. The model identifies which predefined natural language constitutional principles are applicable to the generated response.
  3. The model evaluates if its response actually abides by the identified constitutional principles.
  4. If the response does not comply, the model critiques itself in light of the principle.
  5. The model then rewrites its own response to be in compliance with the principle.
  6. The intermediate critique and rewriting steps are removed, and the model is trained to directly produce the correct, aligned response in the future.
$300 billion
AI industry global CapEx spending today Roughly, expected to double annually.
50%
Estimated percentage of white-collar jobs AI could impact Dario Amodei's prediction for entry-level white-collar jobs, leading to potential 20% unemployment.
82%
Customer service resolution rate by AI agents Achieved by tools like Finn and Intercom without human involvement.
95%
Percentage of code written by Claude's code team Allows a much smaller team to be 10-20x more impactful.
Less than 1000
Number of people working on AI safety worldwide Compared to $300 billion industry CapEx.
ASL 3
Current AI Safety Level (ASL) of Anthropic's models Indicates a 'little bit risk of harm' but not significant; ASL 4 is significant loss of human life, ASL 5 is extinction-level.
$20,000
Cost of a Unitree humanoid robot Highlighting that hardware is becoming affordable, with intelligence being the missing piece.
2028
Forecasted 50th percentile chance of superintelligence Based on superforecaster reports like AI 2027.
3%
World GDP increase rate Current rate; a 10% annual increase would signal a major societal transformation due to AI.
10x
Decrease in cost for a given amount of AI intelligence Achieved through algorithmic, data, and efficiency improvements over time.