Anthropic co-founder on quitting OpenAI, AGI predictions, $100M talent wars, 20% unemployment, and the nightmare scenarios keeping him up at night | Ben Mann
Benjamin Mann, co-founder and tech lead for product engineering at Anthropic, discusses the accelerating pace of AI, the economic Turing test for AGI, and why he left OpenAI to prioritize AI safety. He shares how Anthropic operationalizes safety, the potential impact on jobs, and advice for thriving in an AI-driven future.
Deep Dive Analysis
16 Topic Outline
AI Talent War and Compensation Trends
Accelerating AI Progress and Scaling Laws
Defining Transformative AI and the Economic Turing Test
Impact of AI on Jobs and Future of Work
Preparing for an AI-Driven Future
Founding Anthropic: Prioritizing AI Safety
Balancing AI Safety with Product Progress
Constitutional AI and Model Alignment
Personal Motivation for AI Safety
Risks of Autonomous Agents and Software-Only AI
Forecasting Superintelligence and Societal Impact
Probability of Aligning AI Correctly
Reinforcement Learning from AI Feedback (RLAIF)
Biggest Bottlenecks in AI Intelligence Improvement
Personal Reflections on Responsibility and AI's Future
Anthropic’s Growth and Innovation Teams
6 Key Concepts
Transformative AI
Transformative AI is a term preferred over AGI, focusing on whether AI objectively causes significant transformation in society and the economy. It's measured by its ability to perform a sufficient number of money-weighted jobs, indicating a new era of societal change and economic growth.
Economic Turing Test
This test measures if an AI agent can pass as a human worker for a specific job. If you contract an agent for a month or three months, and it turns out to be a machine rather than a person, it has passed the economic Turing test for that role.
Constitutional AI
This is a method where an AI model learns desired behaviors from a list of natural language principles, rather than relying solely on human raters. The model generates a response, critiques itself against these principles, and then rewrites its response to comply, recursively improving its alignment with predefined values.
Responsible Scaling Policy (RSP)
Anthropic's policy defines AI Safety Levels (ASL) to assess the risk to society at different levels of model intelligence. It helps determine what safeguards are needed as models become more capable, ranging from minor harm to extinction-level risks.
Reinforcement Learning from AI Feedback (RLAIF)
RLAIF is a technique where AI models self-improve without direct human intervention. An example is constitutional AI, where the model critiques and rewrites its own responses based on principles, or models commenting on code written by other models to improve it.
Resting in Motion
A technique for managing the burden of weighty topics, suggesting that the 'busy state' is the normal human condition, not rest. It encourages working at a sustainable pace, recognizing that it's a marathon, not a sprint, and that constant engagement is natural.
8 Questions Answered
They felt that safety wasn't the top priority at OpenAI, despite the company's mission. They wanted to create an organization that could be at the frontier of AI research while prioritizing safety above everything else, especially given the growing understanding of potential risks.
Anthropic has found that working on safety actually helps with progress, creating a 'convex' relationship. For example, the beloved personality of Claude 3 Opus is a direct result of their alignment research, as efforts to make the AI helpful, honest, and harmless led to its distinctive character.
The model generates a response, then checks if it complies with a set of natural language principles (like human rights or privacy terms). If not, the model critiques itself and rewrites its response in light of the principle, effectively self-improving its alignment with desired values.
Even without physical robots, software-only AI poses real risks, as demonstrated by nation-state hacking of critical infrastructure (e.g., shutting down power plants). Such attacks can cause significant real-world harm, affecting millions of people.
Based on superforecaster reports like AI 2027, he estimates a 50th percentile chance of hitting some form of superintelligence by around 2028. This forecast is based on scaling laws, improvements in model training, and the global expansion of data centers and power.
Benjamin Mann believes we are in a 'middle world' where alignment research truly matters, and our actions are pivotal. While he estimates a 0-10% chance of an X-risk or extremely bad outcome, he stresses the critical importance of working on alignment because the downside risk is so large and few people are addressing it.
The primary bottlenecks are data centers and power chips (compute), as well as the need for more researchers to develop better algorithms and data efficiency. Significant improvements in these areas could dramatically accelerate AI progress.
He emphasizes fostering curiosity, creativity, and kindness, drawing inspiration from Montessori education. He believes traditional academics may become less relevant, and instead, children should learn to be thoughtful, curious, and empathetic individuals.
13 Actionable Insights
1. Proactively Align AI Models
Work on AI alignment well ahead of time, as it may be too late to align superintelligent models once they emerge, making early efforts extremely critical.
2. Address AI Downside Risks
Actively look at and address the downside risks of AI, even if the probability of extremely bad outcomes seems small, because the potential impact on humanity is very large.
3. Define Explicit AI Values
Implement ‘Constitutional AI’ by providing models with natural language principles (e.g., from human rights declarations) to learn desired behaviors, ensuring a principled stance on AI values beyond human raters.
4. Implement AI Self-Critique
Use a process where AI models generate a response, critique themselves against constitutional principles, and then rewrite their response if non-compliant, recursively improving their alignment with desired values.
5. Enable AI Empirical Self-Improvement
Give AI models the tools to be empirical, allowing them to form theories, design experiments, and test them out, which can lead to recursive self-improvement and potentially surpass human capabilities.
6. Contribute to AI Safety Broadly
Recognize that having an impact on AI safety isn’t limited to AI research; roles in product, finance, operations, and other areas are crucial for funding and influencing future safety efforts.
7. Educate on AI Safety Risks
Actively ‘safety pill’ yourself by learning about AI risks and spread this awareness to your network, as few people are currently working on this critical problem.
8. Use AI Tools Ambitiously
Approach new AI tools with ambition, asking for significant changes and trying multiple times even if the first attempt fails, as success rates are much higher with persistence and varied approaches.
9. Build for Future AI Capabilities
Design products and solutions not just for current AI capabilities, but for what will be possible six months to a year from now, anticipating that currently imperfect functionalities will become robust.
10. Foster Kids’ Curiosity & Kindness
Focus on teaching children curiosity, creativity, self-led learning, thoughtfulness, and kindness to help them thrive in an AI future, as factual knowledge may become less critical.
11. Adopt ‘Resting in Motion’
Embrace ‘resting in motion’ as a sustainable work strategy, recognizing that a busy state is often natural, and aim for a marathon pace rather than a sprint to avoid burnout.
12. Accept ‘Everything is Hard’
Remind yourself that ’everything is hard’ and it’s okay for tasks to not be easy, fostering perseverance to push through challenges in work and life.
13. Use a Bidet
Use a bidet for personal hygiene, as it is described as life-changing, more civilized, and likely to become a standard practice in the future.
8 Key Quotes
My best case scenario at Meta is that we make money. And my best case scenario at Anthropic is we like affect the future of humanity.
Benjamin Mann
I think progress has actually been accelerating where if you look at the cadence of model releases, it used to be like once a year. And now with the improvements in our post-training techniques, we're seeing releases every month or three months.
Benjamin Mann
If you just think about like 20 years in the future where we're like way past the singularity, it's hard for me to imagine that even capitalism will look at all like it looks today.
Benjamin Mann
Even for me, I'm and being like in the center of a lot of this transformation, I'm not immune to job replacement either. So just some vulnerability there of like, at some point, it's coming for all of us.
Benjamin Mann
Superintelligence is a lot of about like, how do we keep God in a box, uh, and not let the God out? And with language models, it's been kind of both hilarious and terrifying at the same time to see people pulling the God out of the box and being like, yeah, come, come use the whole internet. Like, here's my bank account, do all sorts of crazy stuff.
Benjamin Mann
Once we get to super intelligence, it will be too late to align the models probably. This is a problem that's potentially extremely hard and that we need to be working on way ahead of time.
Benjamin Mann
I think like my best granularity of forecast for like, could we have an X risk or extremely bad outcome from AI is somewhere between zero and 10 percent.
Benjamin Mann
These are wild times. If they don't seem wild to you, then you must be living under a rock. But also get used to it because this is as normal as it's going to be. It's going to be much weirder very soon.
Benjamin Mann
1 Protocols
Constitutional AI Alignment Process
Benjamin Mann- The AI model produces an initial output based on an input prompt.
- The model identifies which predefined natural language constitutional principles are applicable to the generated response.
- The model evaluates if its response actually abides by the identified constitutional principles.
- If the response does not comply, the model critiques itself in light of the principle.
- The model then rewrites its own response to be in compliance with the principle.
- The intermediate critique and rewriting steps are removed, and the model is trained to directly produce the correct, aligned response in the future.