The coming AI security crisis (and what to do about it) | Sander Schulhoff

Dec 21, 2025 1h 32m 11 insights Episode Page ↗
Sander Schulhoff, a leading AI security researcher, discusses the critical vulnerabilities of current AI systems to prompt injection and jailbreaking attacks. He highlights that AI guardrails are largely ineffective, and the only reason for no massive attacks yet is early adoption, not security.
Actionable Insights

1. Strictly Limit AI Agent Permissions

Ensure any AI agent or system capable of taking actions (e.g., sending emails, modifying databases) is granted only the absolute minimum necessary permissions, as malicious users can trick it into performing any action it’s allowed. This aligns with classical cybersecurity’s proper permissioning.

2. Invest in AI-Cybersecurity Expertise

Develop or hire expertise that bridges classical cybersecurity and AI security, as AI systems present fundamentally different security challenges compared to traditional software. This combined knowledge is vital for identifying unique vulnerabilities and implementing effective, AI-aware security measures.

3. Adopt “Angry God in Box” Mindset

When designing and securing AI systems, particularly agents, approach them with the mindset that the AI is a malicious entity trying to cause harm and escape control. This proactive mental model helps identify and mitigate risks by focusing on containing and controlling potentially dangerous AI.

4. Implement Context-Aware Permissioning

Utilize frameworks like Google’s Camel to dynamically restrict an agent’s permissions based on the user’s specific request, granting only the necessary read/write capabilities for the task at hand. This prevents prompt injection attacks by limiting the agent’s potential actions from the outset.

5. Avoid AI Guardrails & Red Teaming

Do not rely on AI guardrails or automated red teaming tools as primary defenses against prompt injection and jailbreaking. Guardrails are easily bypassed and ineffective against determined attackers, while automated red teaming offers little novel insight as all current models are vulnerable.

6. Do Not Deploy Prompt-Based Defenses

Refrain from using prompt engineering (e.g., adding explicit instructions within the prompt) as a defense mechanism for AI systems. These defenses are known to be highly ineffective and offer minimal protection against adversarial attacks.

7. Understand Simple Chatbot Limitations

If your AI system is merely a chatbot for FAQs or information retrieval without action-taking capabilities or access to sensitive data, extensive defensive measures are likely unnecessary. The primary risk is reputational harm, which can often be achieved by users through other means.

8. Educate Your Team on AI Security

Prioritize educating your team, including decision-makers, about the realities of AI security, prompt injection, and jailbreaking. Increased awareness helps prevent poor deployment decisions and fosters a deeper understanding of AI’s unique risks.

9. Monitor AI System Inputs/Outputs

Implement logging for all inputs and outputs of your AI systems. This practice allows for later review to understand user interaction, identify potential misuse, and continuously improve the system, even if it doesn’t directly prevent attacks.

10. Beware Guardrail Overconfidence

Be aware that deploying AI guardrails can create a false sense of security regarding your AI systems’ robustness. This overconfidence is a significant problem, especially as agentic AI capabilities increase the potential for real-world damage.

11. Avoid Offensive AI Security Research

Researchers and practitioners should refrain from publishing new methods for jailbreaking or prompt injection. The community already understands these vulnerabilities, and further offensive research primarily provides more attack vectors without aiding defensive progress.