Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google, and Amazon
Aishwarya Riganti (OpenAI, Google) and Kuriti Bottom (Alexa, Microsoft) discuss building successful AI products, emphasizing non-determinism and the agency-control trade-off. They advocate a "problem-first," step-by-step approach with continuous calibration, hands-on leadership, and an empowering culture to build reliable AI systems.
Deep Dive Analysis
15 Topic Outline
Current State of AI Product Development Challenges
Fundamental Differences Between AI and Traditional Software
The Agency-Control Trade-off in AI Systems
Strategies for Building AI Products Incrementally
Risks of Prompt Injection and Jailbreaking
Key Patterns for Successful AI Product Development
The Role of Leaders and Culture in AI Adoption
Understanding Evals vs. Production Monitoring
OpenAI Codex Team's Evaluation Approach
The Continuous Calibration, Continuous Development (CC/CD) Framework
Multi-Agent Systems: Misunderstood vs. Effective Patterns
Overhyped and Underhyped AI Concepts
Vision for the Future of AI Products (2026)
Essential Skills for AI Product Builders
Pain is the New Moat Concept
8 Key Concepts
Non-determinism in AI Products
Unlike traditional software with well-mapped workflows, AI products operate with non-deterministic APIs (LLMs). This means user input can be highly varied (natural language), and the AI's output is probabilistic, making both input and output behavior unpredictable.
Agency-Control Trade-off
This refers to the inherent balance between granting AI systems more decision-making capabilities (agency) and the corresponding relinquishment of human control. As AI gains more agency, humans lose some control, necessitating trust and reliability in the AI's performance.
Problem-First Approach
A methodology for building AI products that prioritizes understanding and breaking down the core problem to be solved, rather than getting sidetracked by the complexities or advanced capabilities of AI solutions.
Evals (Evaluation Metrics)
These are dimensions or data sets built from trusted product thinking and knowledge to ensure an AI agent performs well on specific, known problems. They help test if the system meets expected behavior for critical scenarios.
Production Monitoring
This involves deploying an AI application and continuously tracking key metrics, including implicit and explicit customer feedback, to understand real-world usage patterns and identify unexpected behaviors or failure modes that were not anticipated during development.
Semantic Diffusion
A concept where a technical term, like 'evals' or 'agents,' initially has a specific meaning but becomes widely used with various, often butchered, definitions, leading to confusion and a loss of its original precise meaning.
Continuous Calibration, Continuous Development (CC/CD) Framework
An iterative software development lifecycle for AI products that combines continuous development (scoping, data curation, app setup, eval metrics, deployment) with continuous calibration (analyzing behavior, spotting error patterns, applying fixes, designing new eval metrics) to build trust and improve systems over time.
Pain is the New Moat
This concept suggests that in rapidly evolving fields like AI, a company's competitive advantage (moat) comes not from being first or having flashy features, but from the persistent effort and iterative learning through the 'pain' of understanding complex problems and building robust solutions that continuously improve.
11 Questions Answered
AI products differ due to non-determinism in both user input and LLM output, and the inherent agency-control trade-off when handing over decision-making to AI systems.
Starting small forces a problem-first approach, helps manage complexity, and allows teams to build confidence and calibrate AI behavior with high human control before increasing autonomy.
Common pitfalls include focusing on solution complexity over the problem, jumping to fully autonomous agents too soon, and underestimating the importance of reliability and customer trust.
Successful AI product development is characterized by great leaders who are hands-on and vulnerable, an empowering culture that augments human workflows, and technical teams obsessed with understanding workflows and iterating quickly to build feedback flywheels.
No, evals alone are not sufficient; a balanced approach combining evals (for known error patterns) with production monitoring (for catching emerging, unexpected patterns and implicit user feedback) is crucial for comprehensive reliability.
The Codex team uses a balanced approach with both evals for core functionality and extreme care in understanding customer usage through implicit and explicit feedback, including social media monitoring, due to the highly customizable nature of coding agents.
Companies can use the Continuous Calibration, Continuous Development (CC/CD) framework, which involves iterative development with evaluation metrics and continuous calibration based on observed user behavior, starting with low agency and high human control.
Multi-agent systems, particularly the notion of simply dividing responsibilities among agents and expecting them to work together via peer-to-peer gossip protocols, are often misunderstood and overhyped given current model capabilities.
Coding agents are considered underhyped, as their potential for optimizing processes and creating significant value in various companies is still not fully realized or widely adopted despite their chatter on platforms like Twitter and Reddit.
By 2026, AI is expected to feature more proactive 'background agents' that deeply understand workflows and context, anticipating user needs and proactively solving problems, alongside advancements in multimodal experiences for richer human-like interactions.
Beyond execution mechanics, critical skills include strong design, judgment, and taste, along with agency, ownership, and persistence to rethink experiences and focus on high-impact work, as AI automates busy work.
15 Actionable Insights
1. Build AI Products Step-by-Step
Start with high human control and low AI agency, gradually increasing AI autonomy as confidence grows. This approach manages complexity, builds trust, and ensures focus on the core problem.
2. Implement Continuous Calibration & Development (CCCD)
Adopt an iterative AI product lifecycle that continuously develops capabilities and calibrates behavior. This framework helps manage non-determinism and builds trust by integrating feedback loops from deployment.
3. Focus on Problem-First Approach
Prioritize deeply understanding the problem you’re solving before getting lost in the complexities of AI solutions. Starting with lower autonomy forces this problem-first mindset.
4. Leaders: Be Hands-On with AI
Dedicate time to hands-on learning and staying updated with AI advancements to rebuild intuitions. This top-down engagement is vital for guiding company decisions and fostering trust in AI technology.
5. Embrace Vulnerability & Learning
Be comfortable admitting your intuitions might be wrong and actively seek to learn from everyone. This fosters a culture of humility and continuous adaptation in the rapidly evolving AI field.
6. Build Flywheels for Improvement
Focus on establishing feedback loops and systems that allow your AI products to continuously learn and improve over time, rather than just aiming to be the first to market with an agent.
7. Foster an Empowering AI Culture
Cultivate a company culture that views AI as an augmentation tool to empower employees and enhance productivity, rather than a threat. This encourages collaboration from subject matter experts.
8. Obsess Over Workflow Understanding
Deeply understand your existing workflows to identify which parts are ripe for AI augmentation versus those needing human intervention. This ensures the right AI tools are chosen for the specific problem.
9. Combine Evals and Production Monitoring
Utilize both evaluation datasets (evals) for known errors and production monitoring (implicit/explicit signals) to discover emerging, unexpected patterns and user behavior. This creates a robust feedback loop.
10. Prioritize AI Reliability
Recognize that reliability is paramount for enterprises deploying AI. Start with low-autonomy, human-controlled systems to build trust and minimize risks before exposing users to more autonomous AI.
11. Be Skeptical of “One-Click Agents”
Be wary of solutions promising instant, significant ROI from “one-click agents.” Building robust AI solutions, especially with messy enterprise data, requires substantial time (4-6 months) to establish learning pipelines.
12. Cultivate Persistence (“Pain is the New Moat”)
Embrace the “pain” of learning, implementing, and iterating through multiple approaches to solve problems in the new AI landscape. This persistence and gained knowledge become a significant moat for success.
13. Develop Design, Judgment, and Taste
As AI makes implementation cheaper, focus on developing strong design skills, judgment, and taste in product building. Prioritize solving real pain points over merely building quickly with new tools.
14. Anticipate Proactive/Background Agents
Look towards a future where AI agents proactively understand workflows, anticipate needs, and present solutions or insights, such as fixing tickets or suggesting refactors.
15. Invest in Multimodal AI Experiences
Recognize the potential of multimodal AI (combining language, vision, etc.) to create richer, more human-like interactions and unlock insights from messy, unstructured data like handwritten documents.
8 Key Quotes
Most people tend to ignore the non-determinism. You don't know how the user might behave with your product and you also don't know how the LLM might respond to that.
Aishwarya Naresh Reganti
Every time you hand over decision-making capabilities to agentic systems, you're kind of relinquishing some amount of control on your edge.
Aishwarya Naresh Reganti
Leaders have to get back to being hands-on. You must be comfortable with the fact that your intuitions might not be right and you probably are the dumbest person in the room and you want to learn from everyone.
Kiriti Badam
It's not about being the first company to have an agent among your competitors. It's about have you built the right flywheels in place so that you can improve over time?
Aishwarya Naresh Reganti
Pain is the new moat.
Kiriti Badam
If the unexamined life is not worth living, was the unlived life worth examining?
Paul Kalanithi (quoted by Aishwarya Naresh Reganti)
They told it couldn't be done, but the fool didn't know it, so he did it anyway.
Aishwarya Naresh Reganti (quoting her dad)
You can only connect the dots looking backwards.
Steve Jobs (quoted by Kiriti Badam)
2 Protocols
Continuous Calibration, Continuous Development (CC/CD) Framework
Aishwarya Naresh Reganti and Kiriti Badam- **Continuous Development Loop:** Scope capability and curate data by defining expected inputs and outputs, ensuring team alignment.
- Set up the AI application.
- Design appropriate evaluation metrics to focus on key dimensions of performance.
- Deploy the application and run the defined evaluation metrics.
- **Continuous Calibration Loop:** Analyze observed behavior and spot unexpected error patterns.
- Apply fixes for identified issues.
- Design newer evaluation metrics for emerging patterns, if necessary.
- Iterate through these loops, starting with lower agency and higher human control, and gradually increasing agency while lowering control over time.
Progressive Agency and Control for Customer Support AI Agent
Aishwarya Naresh Reganti- **V1: Routing Agent (High Control, Low Agency):** Build an agent that classifies and routes customer support tickets to the correct department. Humans retain high control to correct misroutings and identify underlying data quality issues.
- **V2: Co-pilot (Medium Control, Medium Agency):** Once routing is reliable, enable the agent to provide suggestions and draft responses based on standard operating procedures. Human agents review and edit these drafts, with their modifications logged as implicit feedback for system improvement.
- **V3: End-to-End Resolution Assistant (Low Control, High Agency):** After gaining confidence in the agent's suggestions, allow it to autonomously draft and resolve tickets, minimizing human intervention for routine issues.