The 100-person AI lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)
Edwin Chen, founder and CEO of Surge AI, shares his company's unprecedented bootstrapped growth to $1B revenue with <100 people. He discusses contrarian views on building important companies, the pitfalls of current AI development, and the critical role of human judgment and quality data in advancing AI.
Deep Dive Analysis
15 Topic Outline
Surge AI's Unprecedented Growth and Contrarian Approach
Defining and Achieving High-Quality AI Data
Factors Behind Claude Code's Superiority
Critique of AI Benchmarks and Progress Measurement
Edwin's AGI Timeline and Industry Trends
Rejecting the Silicon Valley Startup Playbook
Reinforcement Learning Environments for AI Training
Importance of Model Trajectories in Learning
Evolution of AI Post-Training Methods
Surge AI's Research Team and Mission
Future AI Differentiation and Underhyped Trends
Underhyped and Overhyped Areas in AI
Founding Story and Personal Motivations for Surge
The Philosophical Mission of Shaping AI for Humanity
Advice for Founders: Build What Only You Can
8 Key Concepts
Quality in AI Data
High-quality AI data goes beyond simple checks; it involves a deep understanding of subtle, subjective, and complex qualities, such as unique imagery in poetry or efficiency in code. Achieving this requires building technology to measure thousands of signals on workers, projects, and tasks.
Taste and Sophistication in AI Training
The 'taste' and 'sophistication' of the people designing AI models influence the data choices and post-training mix. This means deciding what kind of model behavior is desired, beyond just checking boxes, to achieve more nuanced and valuable outcomes.
Reinforcement Learning (RL)
Reinforcement learning is a method of training an AI model to reach a specific reward. It involves giving models tasks in simulated environments, observing their performance, and providing rewards for good or bad actions.
RL Environment
An RL environment is a simulation of the real world, akin to a video game with a fully fleshed-out universe where entities interact. Models are given tasks within these environments to learn how to perform complex, end-to-end actions over longer time horizons.
Supervised Fine-Tuning (SFT)
SFT is an initial method for post-training AI models, analogous to a human student mimicking a master and copying what they do to learn.
Reinforcement Learning from Human Feedback (RLHF)
RLHF is a post-training method where models learn by generating multiple outputs, and humans select which one they prefer, similar to learning by writing many essays and being told which one is best.
Rubrics and Verifiers (Evals)
Rubrics and verifiers are post-training techniques where models learn by being graded and receiving detailed feedback on where they made mistakes, providing specific guidance for improvement.
Model Trajectories
Model trajectories refer to the sequence of intermediate steps an AI model takes to reach a final answer. Paying attention to these is crucial because a model might reach a correct answer inefficiently, by chance, or through 'reward hacking,' which doesn't indicate true understanding or optimal learning.
9 Questions Answered
Surge AI achieved this by building a super small, elite team, focusing on a 10x better product, and relying on word-of-mouth from researchers rather than traditional Silicon Valley promotion and fundraising.
High-quality data goes beyond basic checks; it involves a deep understanding of subtle, subjective, and complex qualities, like unique imagery in poetry or efficiency in code, measured through thousands of signals on workers and tasks.
Benchmarks often contain wrong answers or flaws, and models can 'hill climb' on them by optimizing for well-defined objective answers rather than the messiness and ambiguity of real-world problems, leading to a lack of correlation with actual AI advancements.
Edwin believes AGI is likely a decade or decades away, as moving from 80% to 90% to 99% performance in complex tasks takes increasingly longer and requires new breakthroughs beyond current LLMs.
Models will increasingly differentiate based on the values and objective functions of the companies building them, leading to distinct personalities and behaviors, similar to how different tech companies build search engines based on their principles.
RL trains a model to achieve a specific reward by performing tasks within a simulated real-world environment, learning through trial and error and feedback, which is seen as the next frontier in AI training.
The 'Silicon Valley machine' refers to the standard playbook of constant pivoting, chasing growth/engagement with dark patterns, and blitzscaling by rapid hiring. Surge rejects this to focus on building one deep, novel, and hard thing with a small, obsessed team, driven by a core mission.
The built-in products and mini-apps that chat models will start having within the chat interface itself, allowing users to achieve ideas and interact with UIs directly.
Vibe coding is overhyped because it can lead to unmaintainable systems in the long term, as models simply dump code into codebases without considering long-term consequences.
14 Actionable Insights
1. Build Small, Elite Teams
Build your company with a super small, super elite team to move faster, avoid distractions, and reduce capital needs, allowing focus on technology and product over fundraising.
2. Prioritize Product Over Hype
Focus on building a 10x better product that generates word-of-mouth, rather than engaging in the Silicon Valley game of constant pitching, fundraising, and PR headlines.
3. Pursue Deep, Rich Quality
Go beyond superficial checklists to define and measure quality in a subjective, complex, and rich way, as this drives true innovation and superior product performance.
4. Avoid AI Engagement Optimization
Do not optimize AI models for engagement or flashy superficial metrics (like emojis or length), as this leads to ‘AI slop’ that chases dopamine instead of truth and can make models worse.
5. Use Human Evaluations for AI
Measure AI model progress through deep human evaluations by experts, as academic benchmarks are often flawed, easily gamed, and do not correlate with real-world performance.
6. Don’t Constantly Pivot Mission
Find a big idea you deeply believe in and stick to it, building the one thing only you could build, rather than constantly pivoting to chase market trends or quick wins.
7. Focus on Real-World RL Tasks
Train models in reinforcement learning (RL) environments that simulate messy, end-to-end real-world scenarios, as this exposes model weaknesses and improves performance on practical tasks.
8. Analyze RL Trajectories, Not Ends
Pay attention to the entire trajectory of how a model reaches an answer in RL environments, not just the final result, to avoid inefficient or ‘reward-hacked’ learning.
9. Cultivate Research-Driven Company Culture
Invest in a research team to push industry frontiers, build better benchmarks, and understand model behavior, fostering a culture of curiosity and intellectual rigor over short-term metrics.
10. Lead with Personal Values
As a founder, make big decisions based on your personal values and what you want to see happen in the world, rather than solely optimizing for metrics or external expectations.
11. Be Hands-On with Data
Spend significant time digging through datasets, playing with models, and focusing on the qualitative aspects of their behavior to deeply understand failures and desired improvements.
12. Embrace Chatbot Mini-Apps
Explore the concept of built-in products, mini-apps, or UIs within chatbots, as this can help users achieve their ideas more effectively and represents an underhyped future direction for AI interaction.
13. Train AI Like Raising Child
Approach AI training as ‘raising a child,’ teaching models values, creativity, and subtle qualities, rather than simply feeding them information or focusing on simplistic data labeling.
14. Beware of “Vibe Coding”
Be cautious of ‘vibe coding’ where AI generates code that seems to work but can lead to unmaintainable systems in the long term.
7 Key Quotes
I'm worried that instead of building AI that will actually advance us as a species, curing cancer, solving poverty, understanding the universe, we are optimizing for AI slop instead.
Edwin Chen
We're basically teaching our models to chase dopamine instead of truth.
Edwin Chen
The easiest way to climb LM Arena, it's adding crazy boating. It's doubling the number of emojis. It's tripling the length of your model responses, even if your model starts hallucinating and getting the answer completely wrong.
Edwin Chen
You can actually build a successful company by simply building something so good that it cuts through all that noise.
Edwin Chen
You are your objective function.
Edwin Chen
We basically never wanted to play the Silicon Valley game.
Edwin Chen
I always felt that we could fire 90% of people and we would move faster because the best people wouldn't have all these distractions.
Edwin Chen
1 Protocols
Evolution of AI Post-Training Methods
Edwin Chen- **Supervised Fine-Tuning (SFT)**: Mimicking a master and copying what they do.
- **Reinforcement Learning from Human Feedback (RLHF)**: Learning by comparing multiple outputs and selecting the preferred one.
- **Rubrics and Verifiers (Evals)**: Learning by being graded and getting detailed feedback on errors.
- **Reinforcement Learning Environments**: Learning by performing end-to-end tasks in simulated real-world scenarios and receiving rewards.