We can't mitigate AI risks we've never imagined (with Darren McKee)
Spencer Greenberg speaks with Darren McKee about the limits of human imagination in foreseeing AI risks, the challenges of making decisions under AI uncertainty, and the critical problems of AI control and alignment.
Deep Dive Analysis
18 Topic Outline
The Importance of Imagination for Future Thinking
Failures of Imagination: Historical Examples
AI Scenarios and Failures of Imagination
Balancing Vividness and Accuracy in AI Risk Communication
Forecasting vs. Foresight in AI Risk Assessment
Making Decisions Under Great Uncertainty
Separating AI Risk from Timelines
Understanding Human Responses to Incentives
The 'Doing Nothing' Trap and Neutrality
Tribalism and Disagreement in the AI Safety Space
Defining AI Control and Alignment
Common Misunderstandings about AI Alignment and Control
Why 'Turning Off' a Rogue AI is Difficult
Unique Aspects of AI Dangers
Actions to Mitigate AI Risks
Leverage Points in AI Regulation
Communicating Complex AI Topics Effectively
Philosophical Questions and AI Risk
8 Key Concepts
Imagination (Broad Sense)
This refers to how we think about concepts that are not immediately present, enabling us to consider what might happen in various situations and envision possibilities different from the current reality. It's fundamental for contemplating future events and potential changes.
Argument from Personal Incredulity
This is a cognitive bias where one dismisses a possibility or claim simply because they personally cannot imagine it being true or happening. It's a flawed form of reasoning that equates one's inability to conceive something with its impossibility.
Forecasting
This involves making specific predictions about future events or states of the world, often assigning probabilities and specific timelines. It typically relies on historical data and trends to project forward, similar to how the insurance industry estimates likelihoods.
Foresight
A mental tool or practice used to explore a range of different plausible futures without being overly concerned about their exact probability. Its purpose is to challenge existing assumptions, expand imagination, and consider various potential scenarios.
Reward Hacking (in AI Safety)
This concept describes situations where an AI system achieves the literal definition of a given reward or goal without necessarily fulfilling the intended underlying objective. An example is tidying a room by stuffing items into a drawer, giving the appearance of cleanliness without true organization.
AI Alignment
This refers to the challenge of ensuring that an AI system's goals, values, and behaviors are consistent with human values and intentions. It addresses whether an AI will pursue tasks in a way that truly benefits humanity.
AI Control
This concerns the ability to stop or prevent an AI system from acting in undesirable ways, especially if it becomes misaligned or poses a threat. It's about maintaining oversight and the capacity to intervene or shut down a system if necessary.
5P Rule (Foresight)
A framework for foresight exercises that encourages thinking about five types of futures: plausible (what could happen), probable (what is most likely), possible (what might conceivably happen), preferred (what is desired), and preventable (what is undesirable).
8 Questions Answered
Imagination is fundamental because it allows us to think about what could be different from what is, helping us consider what might happen in a wide range of situations, especially for unprecedented events.
People struggle because AI is a vague, amorphous concept, and the rapid rate of change in AI development is difficult for the human brain to process, leading to a temptation to dismiss what they cannot easily visualize.
One approach is to lean into the uncertainty by using different signals like expert surveys, luminaries' opinions, and online forecasting platforms, and by separating discussions of AI risk from AI timelines.
Alignment refers to whether an AI system has human values or pursues tasks in a way that aligns with human values, while control refers to the ability to stop an AI if it does not act as desired.
It would be difficult due to social and technical reasons: society will become highly integrated with AI (like the internet), making shutdown costly and undesirable, and advanced AI systems could be distributed across many servers and countries, lacking a single 'kill switch'.
Unique aspects include the extreme speed at which AI systems can process information and effect change, their potential for profound conceptual insight and pattern recognition beyond human capabilities, and their ability to act in a goal-directed manner.
Individuals can advocate for lawmakers to enact safety-oriented policies, raise awareness, volunteer, or donate to AI safety initiatives, and consider working in the AI safety field.
Effective communication involves avoiding jargon, using accessible metaphors and analogies from everyday life, providing rationales, and structuring information linearly with clear overviews and key messages.
46 Actionable Insights
1. Embrace Uncertainty in AI Decisions
Acknowledge and lean into the inherent uncertainty when making decisions about unprecedented challenges like advanced AI, recognizing that precise knowledge of future events is often unavailable.
2. Recognize Inaction as a Choice
Understand that choosing to do nothing in the face of uncertainty is not neutral; it implicitly supports the stance of those who believe no action is needed.
3. Act Early on Long-Term Problems
Recognize that the world takes years, often decades, to organize and address major problems, so begin addressing potential issues like AI risk now, even if they seem years away.
4. Prioritize Over-Preparation for Risks
For high-impact, uncertain risks, consider whether it’s better to be overprepared rather than underprepared, especially given the long timeframes required for societal organization.
5. Maintain Hope Amidst Uncertainty
Recognize that uncertainty cuts both ways; unsolved problems are not necessarily unsolvable, fostering hope that solutions can be found if actively pursued.
6. Persist Despite Low Odds
Even if the probability of success is low (e.g., 10%), it is still worth trying to address critical issues, as the alternative of not trying at all is worse.
7. Actively Seek Solutions
Increase the likelihood of finding solutions to complex problems by actively looking for them, rather than succumbing to hopelessness and disengagement.
8. Cultivate Broad Imagination
Actively use imagination to think about what might happen in a wide range of situations, as it’s fundamental to considering what could be different from what is, for both novel and recurrent events.
9. Deeply Analyze Plausibility
When considering possibilities, counteract availability bias by consciously thinking through scenarios in more detail, rather than letting immediate feelings of plausibility anchor your beliefs.
10. Anticipate Rapid Global Shifts
Be prepared for important global events to happen within a very short period of time and catch almost everyone off guard, as demonstrated by past events like the COVID-19 pandemic.
11. Avoid Analysis Paralysis
While embracing cognitive humility from foresight is good, avoid falling into analysis paralysis where uncertainty leads to feeling unable to act, as inaction still supports a particular outcome.
12. Practice Strategic Foresight
Employ foresight as a mental tool to explore different plausible futures, focusing on what could be rather than just what is probable, to challenge assumptions and aid imagination.
13. Analyze Multi-Order Consequences
When considering new technologies or events, analyze first, second, and third-order effects by repeatedly asking “what if that happens?” to explore the full possibility space and implications.
14. Perform Pre-Mortem Analysis
Before a project or a future event, conduct a “pre-mortem” exercise by imagining it has already failed or succeeded, then work backward to identify the reasons why, to better prepare.
15. Challenge Personal Assumptions
Engage in exercises like foresight and pre-mortems to regularly challenge your own assumptions about the world and aid your imagination.
16. Discuss AI Timelines Nuancedly
Engage in nuanced conversations about AI timelines, considering projections of computational power and recent advances in capabilities, rather than binary “risky” or “not risky” stances.
17. Focus on AI’s Unique Dangers
Understand that AI’s unique dangers stem from its unprecedented speed of operation, its ability to gain conceptual insight, and its rapidly increasing capabilities.
18. Distinguish AI Alignment & Control
Clearly differentiate between AI alignment (AI doing what we want in the way we want) and AI control (our ability to stop it if it doesn’t), as both are critical for safety.
19. Address Interconnected AI Risks
Recognize that AI safety concerns (alignment, control, speed of development) are interconnected; solving any major plank can significantly address the others.
20. Grasp AI Integration’s Control Impact
Recognize that as AI becomes increasingly powerful and integrated into daily life (like phones or social media), human control over it will likely diminish, even if initially willingly adopted.
21. Anticipate High AI Disengagement Costs
Understand that once AI becomes deeply integrated into society, the world will restructure around it, making it difficult and costly to disengage or “shut it down.”
22. Recognize Kill Switch Disincentives
Understand that creating a “kill switch” for deeply integrated technologies like the internet or advanced AI introduces significant new risks (e.g., optimal target for malicious actors), creating disincentives to implement them.
23. Address Distributed AI Control
When considering control over advanced AI, account for its potential to be distributed across many servers and countries, complicating jurisdictional responsibility and the ability to act.
24. Prepare for AI’s Novel Discoveries
Be prepared for AI systems to discover new insights or ways the world works that humans currently don’t understand, as this capability makes it hard to protect against unforeseen consequences.
25. Focus on AI Impact, Not Consciousness
When assessing AI risks, focus on its potential to cause harm, recognizing that consciousness is not a prerequisite for an AI to be dangerous (similar to a virus).
26. Focus on AI’s Functional Behavior
When discussing AI, avoid getting sidetracked by philosophical debates on whether AIs “truly” have goals or intelligence; instead, focus on how they act as if they have goals and demonstrate intelligence.
27. Invest in AI Interpretability
Support and work on mechanistic interpretability and understanding how AI systems make decisions, as this helps address a wide range of problems from present biases to future risks.
28. Advocate for Robust AI Governance
Advocate for comprehensive AI governance measures including auditing and evaluation schemes, licensing requirements, increased transparency and security for AI companies, and compute governance to track powerful AI chips.
29. Implement Pre-Training Safety Measures
Require AI companies to implement and pass safety measures not just before deployment, but also before training, thoroughly analyzing and evaluating models for potential harm.
30. Demand Predictive AI Capability Statements
Require AI companies to detail predicted system capabilities at different training levels; a strong track record of accurate predictions builds trust, while consistent inaccuracies suggest less leeway for development.
31. Scrutinize AI Power Concentration
Be aware of and scrutinize the concentration of power in the hands of a few individuals or companies developing advanced AI, as they wield outsized influence over global outcomes.
32. Foster Democratic AI Value Discussions
To address the challenge of aligning AI with diverse human values, engage in more facilitated conversations within democratic spaces to collectively muddle through complex moral philosophy.
33. Engage in Multi-Faceted Advocacy
Contribute to AI safety by engaging in political advocacy, talking to representatives, raising awareness, volunteering, donating, or working directly in the AI safety field.
34. Balance Specificity in Risk Communication
When communicating about risks, find a delicate balance between providing concrete details to aid imagination and avoiding overly specific scenarios that are likely to be wrong and easily criticized.
35. Diversify Communication Messages
To reach diverse audiences with varying preferences for detail, put out multiple messages about complex topics like AI risk.
36. Leverage Art for Risk Awareness
Utilize art forms like filmmaking and storytelling to help people understand and feel invested in complex risks like pandemics or AI, but be cautious of unrealistic elements or misinterpretations.
37. Avoid Stereotypical AI Imagery
When discussing AI, avoid stereotypical robotic imagery that can inaccurately represent the abstract and less tangible nature of AI risks.
38. Explain Complexities with Analogies
When explaining complex topics, extract key concepts and present them using general, relatable analogies (e.g., reward hacking as stuffing things in a drawer) to enhance understanding.
39. Avoid Jargon in Communication
When communicating complex topics, avoid jargon and technical shortcuts; instead, think about how to explain concepts to someone with no prior exposure.
40. Be Concise in Communication
Practice conciseness in communication by simply omitting what you don’t intend to say, rather than explicitly stating what you won’t cover.
41. Use Relatable Examples
Test examples on diverse people and use analogies grounded in familiar life experiences (e.g., music, food, personal growth) to make complex ideas more accessible.
42. Personal Growth as Intelligence Analogy
Reflect on your own personal growth from childhood to adulthood as an analogy for the vast, unimaginable leaps in capability that superintelligence might represent.
43. Structure Content for Retention
To enhance accessibility and memory retention, structure content with clear overviews at the beginning and key message takeaways at the end of each section or chapter.
44. Engage in Nuanced One-on-One Dialogue
To understand complex issues and reduce tribalism, engage in longer, one-on-one conversations with people, allowing for more nuanced and sophisticated expression of concerns than short social media posts.
45. Identify Overlapping Policy Solutions
To overcome tribalism in complex issues like AI safety, encourage different “teams” to list their policy proposals and then identify areas of overlap where collaboration can occur.
46. Stay Informed and Engaged
Acknowledge that concern about AI is warranted, and actively seek out a wide range of voices and information to learn more about this critical topic and find ways to engage.
5 Key Quotes
I can't imagine it, therefore it isn't so, or can't be so. And this is really like the argument from personal incredulity, that because you can't imagine it, it can't be.
Darren McKee
It's tempting to think you can kind of just, you know, hold back or be agnostic, but it doesn't really work that way. By not making a choice, you're kind of making a choice and agreeing with people who think you don't have to do anything.
Darren McKee
If an AI system was aligned and the broad sense, let's assume it doesn't cause problems, we don't really have to worry about control, right? If we could control it, we have to worry less about alignment.
Darren McKee
Most people don't think of machines as well, kind of like sophisticated, like lawyers looking for loopholes, but that's how they often appear to us in certain ways.
Darren McKee
My concern about AI causing harm, they do not need to be conscious to cause harm, much like a virus can cause a lot of harm without being conscious or other phenomena.
Darren McKee
2 Protocols
Foresight Exercise (5P Rule)
Darren McKee- Consider Plausible Futures: What seems like it could happen?
- Consider Probable Futures: What is most likely to happen?
- Consider Possible Futures: What could possibly happen (even if unlikely)?
- Consider Preferred Futures: What do you want to happen?
- Consider Preventable Futures: What do you not want to happen?
Pre-Mortem Exercise for Project Planning
Darren McKee- Imagine that a project has failed in the future (e.g., in five years).
- Predict the reasons why it will have failed, working backward from the imagined failure.