The 100-person AI lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

15 Topic Outline

Surge AI's Unprecedented Growth and Contrarian Approach

Defining and Achieving High-Quality AI Data

Factors Behind Claude Code's Superiority

Critique of AI Benchmarks and Progress Measurement

Edwin's AGI Timeline and Industry Trends

Rejecting the Silicon Valley Startup Playbook

Reinforcement Learning Environments for AI Training

Importance of Model Trajectories in Learning

Evolution of AI Post-Training Methods

Surge AI's Research Team and Mission

Future AI Differentiation and Underhyped Trends

Underhyped and Overhyped Areas in AI

Founding Story and Personal Motivations for Surge

The Philosophical Mission of Shaping AI for Humanity

Advice for Founders: Build What Only You Can

8 Key Concepts

Quality in AI Data

High-quality AI data goes beyond simple checks; it involves a deep understanding of subtle, subjective, and complex qualities, such as unique imagery in poetry or efficiency in code. Achieving this requires building technology to measure thousands of signals on workers, projects, and tasks.

Taste and Sophistication in AI Training

The 'taste' and 'sophistication' of the people designing AI models influence the data choices and post-training mix. This means deciding what kind of model behavior is desired, beyond just checking boxes, to achieve more nuanced and valuable outcomes.

Reinforcement Learning (RL)

Reinforcement learning is a method of training an AI model to reach a specific reward. It involves giving models tasks in simulated environments, observing their performance, and providing rewards for good or bad actions.

RL Environment

An RL environment is a simulation of the real world, akin to a video game with a fully fleshed-out universe where entities interact. Models are given tasks within these environments to learn how to perform complex, end-to-end actions over longer time horizons.

Supervised Fine-Tuning (SFT)

SFT is an initial method for post-training AI models, analogous to a human student mimicking a master and copying what they do to learn.

Reinforcement Learning from Human Feedback (RLHF)

RLHF is a post-training method where models learn by generating multiple outputs, and humans select which one they prefer, similar to learning by writing many essays and being told which one is best.

Rubrics and Verifiers (Evals)

Rubrics and verifiers are post-training techniques where models learn by being graded and receiving detailed feedback on where they made mistakes, providing specific guidance for improvement.

Model Trajectories

Model trajectories refer to the sequence of intermediate steps an AI model takes to reach a final answer. Paying attention to these is crucial because a model might reach a correct answer inefficiently, by chance, or through 'reward hacking,' which doesn't indicate true understanding or optimal learning.

9 Questions Answered

?

How did Surge AI achieve over $1 billion in revenue with under 100 people and no VC funding?

Surge AI achieved this by building a super small, elite team, focusing on a 10x better product, and relying on word-of-mouth from researchers rather than traditional Silicon Valley promotion and fundraising.

?

What defines high-quality data for AI models?

High-quality data goes beyond basic checks; it involves a deep understanding of subtle, subjective, and complex qualities, like unique imagery in poetry or efficiency in code, measured through thousands of signals on workers and tasks.

?

Why are current AI benchmarks often unreliable or misleading?

Benchmarks often contain wrong answers or flaws, and models can 'hill climb' on them by optimizing for well-defined objective answers rather than the messiness and ambiguity of real-world problems, leading to a lack of correlation with actual AI advancements.

?

What is Edwin Chen's timeline for Artificial General Intelligence (AGI)?

Edwin believes AGI is likely a decade or decades away, as moving from 80% to 90% to 99% performance in complex tasks takes increasingly longer and requires new breakthroughs beyond current LLMs.

?

How do AI models become differentiated in the market?

Models will increasingly differentiate based on the values and objective functions of the companies building them, leading to distinct personalities and behaviors, similar to how different tech companies build search engines based on their principles.

?

What is reinforcement learning (RL) in the context of AI training?

RL trains a model to achieve a specific reward by performing tasks within a simulated real-world environment, learning through trial and error and feedback, which is seen as the next frontier in AI training.

?

What is the 'Silicon Valley machine' and why does Surge AI reject its playbook?

The 'Silicon Valley machine' refers to the standard playbook of constant pivoting, chasing growth/engagement with dark patterns, and blitzscaling by rapid hiring. Surge rejects this to focus on building one deep, novel, and hard thing with a small, obsessed team, driven by a core mission.

?

What is an underhyped area in AI?

The built-in products and mini-apps that chat models will start having within the chat interface itself, allowing users to achieve ideas and interact with UIs directly.

?

What is an overhyped area in AI?

Vibe coding is overhyped because it can lead to unmaintainable systems in the long term, as models simply dump code into codebases without considering long-term consequences.

14 Actionable Insights

1. Build Small, Elite Teams

Build your company with a super small, super elite team to move faster, avoid distractions, and reduce capital needs, allowing focus on technology and product over fundraising.

2. Prioritize Product Over Hype

Focus on building a 10x better product that generates word-of-mouth, rather than engaging in the Silicon Valley game of constant pitching, fundraising, and PR headlines.

3. Pursue Deep, Rich Quality

Go beyond superficial checklists to define and measure quality in a subjective, complex, and rich way, as this drives true innovation and superior product performance.

4. Avoid AI Engagement Optimization

Do not optimize AI models for engagement or flashy superficial metrics (like emojis or length), as this leads to ‘AI slop’ that chases dopamine instead of truth and can make models worse.

5. Use Human Evaluations for AI

Measure AI model progress through deep human evaluations by experts, as academic benchmarks are often flawed, easily gamed, and do not correlate with real-world performance.

6. Don’t Constantly Pivot Mission

Find a big idea you deeply believe in and stick to it, building the one thing only you could build, rather than constantly pivoting to chase market trends or quick wins.

7. Focus on Real-World RL Tasks

Train models in reinforcement learning (RL) environments that simulate messy, end-to-end real-world scenarios, as this exposes model weaknesses and improves performance on practical tasks.

8. Analyze RL Trajectories, Not Ends

Pay attention to the entire trajectory of how a model reaches an answer in RL environments, not just the final result, to avoid inefficient or ‘reward-hacked’ learning.

9. Cultivate Research-Driven Company Culture

Invest in a research team to push industry frontiers, build better benchmarks, and understand model behavior, fostering a culture of curiosity and intellectual rigor over short-term metrics.

10. Lead with Personal Values

As a founder, make big decisions based on your personal values and what you want to see happen in the world, rather than solely optimizing for metrics or external expectations.

11. Be Hands-On with Data

Spend significant time digging through datasets, playing with models, and focusing on the qualitative aspects of their behavior to deeply understand failures and desired improvements.

12. Embrace Chatbot Mini-Apps

Explore the concept of built-in products, mini-apps, or UIs within chatbots, as this can help users achieve their ideas more effectively and represents an underhyped future direction for AI interaction.

13. Train AI Like Raising Child

Approach AI training as ‘raising a child,’ teaching models values, creativity, and subtle qualities, rather than simply feeding them information or focusing on simplistic data labeling.

14. Beware of “Vibe Coding”

Be cautious of ‘vibe coding’ where AI generates code that seems to work but can lead to unmaintainable systems in the long term.

7 Key Quotes

I'm worried that instead of building AI that will actually advance us as a species, curing cancer, solving poverty, understanding the universe, we are optimizing for AI slop instead.
Edwin Chen

We're basically teaching our models to chase dopamine instead of truth.
Edwin Chen

The easiest way to climb LM Arena, it's adding crazy boating. It's doubling the number of emojis. It's tripling the length of your model responses, even if your model starts hallucinating and getting the answer completely wrong.
Edwin Chen

You can actually build a successful company by simply building something so good that it cuts through all that noise.
Edwin Chen

You are your objective function.
Edwin Chen

We basically never wanted to play the Silicon Valley game.
Edwin Chen

I always felt that we could fire 90% of people and we would move faster because the best people wouldn't have all these distractions.
Edwin Chen

1 Protocols

Evolution of AI Post-Training Methods

Edwin Chen

**Supervised Fine-Tuning (SFT)**: Mimicking a master and copying what they do.
**Reinforcement Learning from Human Feedback (RLHF)**: Learning by comparing multiple outputs and selecting the preferred one.
**Rubrics and Verifiers (Evals)**: Learning by being graded and getting detailed feedback on errors.
**Reinforcement Learning Environments**: Learning by performing end-to-end tasks in simulated real-world scenarios and receiving rewards.

6 Key Numbers

Over $1 billion

Surge AI revenue last year Achieved with under 100 people, completely bootstrapped, in less than four years.

90%

Percentage of people Edwin felt could be fired from big tech companies to move faster His observation from working at Google, Facebook, and Twitter.

80%

Estimated percentage of average L6 software engineer's job AI will automate in 1-2 years Edwin's AGI timeline prediction.

90%

Estimated percentage of average L6 software engineer's job AI will automate in a few more years Edwin's AGI timeline prediction, following the 80% automation.

99%

Estimated percentage of average L6 software engineer's job AI will automate in even more years Edwin's AGI timeline prediction, following the 90% automation.

89

Number of different ways Douglas Hofstadter translated a single French poem in 'Le Ton Beau de Marot' Illustrates the complexity of translation and the concept of quality.

Deep Dive Analysis

Surge AI's Unprecedented Growth and Contrarian Approach

Defining and Achieving High-Quality AI Data

Factors Behind Claude Code's Superiority

Critique of AI Benchmarks and Progress Measurement

Edwin's AGI Timeline and Industry Trends

Rejecting the Silicon Valley Startup Playbook

Reinforcement Learning Environments for AI Training

Importance of Model Trajectories in Learning

Evolution of AI Post-Training Methods

Surge AI's Research Team and Mission

Future AI Differentiation and Underhyped Trends

Underhyped and Overhyped Areas in AI

Founding Story and Personal Motivations for Surge

The Philosophical Mission of Shaping AI for Humanity

Advice for Founders: Build What Only You Can

Quality in AI Data

Taste and Sophistication in AI Training

Reinforcement Learning (RL)

RL Environment

Supervised Fine-Tuning (SFT)

Reinforcement Learning from Human Feedback (RLHF)

Rubrics and Verifiers (Evals)

Model Trajectories

1. Build Small, Elite Teams

2. Prioritize Product Over Hype

3. Pursue Deep, Rich Quality

4. Avoid AI Engagement Optimization

5. Use Human Evaluations for AI

6. Don’t Constantly Pivot Mission

7. Focus on Real-World RL Tasks

8. Analyze RL Trajectories, Not Ends

9. Cultivate Research-Driven Company Culture

10. Lead with Personal Values

11. Be Hands-On with Data

12. Embrace Chatbot Mini-Apps

13. Train AI Like Raising Child

14. Beware of “Vibe Coding”

Evolution of AI Post-Training Methods