Aligning society with our deepest values and sources of meaning (with Joe Edelman)

16 Topic Outline

Introduction to Value Alignment and Disagreement

Defining 'Values': Political, Normative, and Personal

Intrinsic Values vs. Instrumental Values

Three Strategies for Uncovering Value Convergence

Context's Influence on Value Importance

Democratic Process for AI Value Alignment (Moral Graph)

Applying Moral Graphs to Large Language Models

Example: Parenting Wisdom in Value Alignment

Quantifying Agreement in Value Alignment Research

Broader Applications of Value Alignment: Politics and Markets

Interpersonal Value Alignment and Mediation

The Relationship Between Meaning and Values

Critique of Market Failures in Meeting Deep Values

'Replica Theory' and Inarticulacy of Desire

Societal Gridlock from Lack of Shared Vision

AI's Role in Building a Values-Aligned Future

7 Key Concepts

Values (Political Banner Terms)

These are terms used in political fights (e.g., anti-racism, equality) that serve as markers of affiliation and are often contentious. People temporarily adopt them to push forward their stance in political battles.

Values (Social Norms)

These are expectations that people believe should apply to everyone (e.g., honesty, picking up trash). They represent a broader push towards changing common societal expectations and can be a source of political battle if not universally agreed upon.

Values (Sources of Meaning/Personal Values)

These are things individuals personally find meaningful or ways they want to live (e.g., honesty in relationships, intellectual exploration, courage). They are considered a sub-component that sits below political battles and social norms, providing the underlying reason for the other two types of values.

Intrinsic Values

Something valued for its own sake, not merely as a means to other ends. These are considered psychological facts, forming a dense core of mutually reinforcing ways of life that are an actual part of one's conception of a good life.

Moral Graph

A data structure developed with OpenAI that captures consensus on which values are considered 'wiser' than others in specific contexts. It is a visual representation of articulated values and agreed-upon transitions towards greater wisdom, used to shape language model behavior.

Reward Model

An extra model used in the fine-tuning of Large Language Models (LLMs), such as in RLHF and Constitutional AI. It allocates reward to the base model to guide it towards particular directions, and can be informed by data structures like the moral graph to align LLMs with shared values.

Replica Theory

This theory suggests that when people can be rewarded more for an easier, 'fake' version of a valuable thing than for the real, valuable version, most activity will gravitate towards the easy fake. This often occurs in areas where people are inarticulate about what they truly want, making it difficult to verify genuine satisfaction.

11 Questions Answered

?

How does Joe Edelman define 'values'?

He distinguishes three types: political banner terms (contentious, affiliation markers), social norms (expectations for everyone), and personal sources of meaning (ways individuals want to live, personally endorsed). His work primarily focuses on the third type.

?

Why do people disagree less on values than expected, despite widespread conflict?

People agree more on underlying personal values/sources of meaning when contentious political framing is removed, values are considered in specific contexts, and people are open to discovering 'wiser' candidate values.

?

How can one uncover deeper, less contentious values in a conversation?

Focus on the context in which a value is important, rephrase politically charged values to remove their contentious conclusions, and be open to considering alternative, potentially wiser values.

?

How does the 'Moral Graph' process work to align AI with human values?

Users interact with a GPT-4 chatbot that presents ambiguous situations, interrogates their responses to find underlying values, and then asks them to compare their values with others' and identify 'wiser' value transitions, building a data structure of consensual value wisdom.

?

How is the Moral Graph used to influence Large Language Models (LLMs)?

The Moral Graph can be used to create a reward model that fine-tunes LLMs by rewarding responses that align with the wisest values for a given context, potentially by training the LLM to explicitly state the values it's responding with.

?

What level of agreement on values did Joe Edelman's research find among Americans?

In their study with over 500 representative Americans (Republicans and Democrats), they found approximately 97% agreement on the 'wiser' direction of value transitions, after applying specific thresholds for consensus.

?

How can value alignment principles be applied to reduce political polarization and improve interpersonal communication?

By shifting discussions from contentious policy positions to underlying personal sources of meaning, and then brainstorming how to help everyone live according to those values, as demonstrated in a COVID policy discussion experiment.

?

Why do markets often fail to meet people's deepest needs and values?

Markets are biased towards serving shallow, individual desires over deep, collective ones because businesses are transactional and struggle with the high customization and coordination costs required to address complex, deeply meaningful needs.

?

What is 'Replica Theory' and how does it relate to market failures?

Replica Theory suggests that when an easier, 'fake' version of a valuable thing can be rewarded more than the real thing, the fake version will dominate. This often occurs in areas where people are inarticulate about what they truly want, making it hard to verify if a product or service genuinely satisfies deeper needs.

?

What societal problems arise from a lack of a shared positive vision for the future?

It leads to cynicism, apathy, and gridlock, making it harder to address collective challenges like climate change or AI. Without a compelling vision, warring factions emerge, and policy proposals lack widespread inspiration and support.

?

How can AI help build a society aligned with deeper human values?

AI can act as a scalable means to help individuals clarify their deeper desires and make their satisfaction more verifiable. By outsourcing some market and political management to AI that deeply understands and advocates for collective and profound preferences, society can organize around meaningful ways of living.

10 Actionable Insights

1. Identify Core Personal Values

When facing disagreements, especially political ones, distinguish between “banner terms” (political affiliations), “norms” (expectations on everyone), and “personal values” or “sources of meaning” (what individuals personally find meaningful). Focus on the third type, as people agree much more on these underlying personal values.

2. Rephrase Politically Divisive Values

When discussing values, rephrase politically contentious slogans or terms into more universally relatable concepts that explain why the value is important, removing the specific conclusion. This helps uncover common ground and reduce divisiveness.

3. Consider Context for Value Importance

Recognize that different values rise in importance in different contexts (e.g., leadership, parenting). When evaluating a value, consider the specific situation to understand its relevance and potential for broader agreement.

4. Seek Wiser, Comprehensive Values

Actively seek out and be open to considering other candidate values that might be wiser or more comprehensive than your initial articulation. People are often willing to find other values more insightful, leading to greater convergence.

5. Mediate by Surfacing Core Values

In interpersonal disagreements, focus on uncovering the underlying personal values or “sources of meaning” for each person involved, rather than immediately debating policy or preferences. Present these values to each other to foster understanding and collaboration.

6. Articulate Personal Values for Life

Take time to get clear on your personal values or “sources of meaning” to help structure and plan your own life. This self-reflection can guide choices and lead to a more fulfilling existence.

7. Cultivate Aesthetic Appreciation

Practice finding things you find beautiful and capturing what kind of aesthetic value you are tuned into. This can be done during walks, alone or with a friend, to deepen your connection to your environment and shared experiences.

8. Recognize Market Value Bias

Understand that markets are often better at serving shallow, individual desires than deep, collective ones. This awareness can help you make more informed choices and recognize when market solutions might not fully address your intrinsic needs.

9. Avoid Easy “Fake” Solutions

Be aware of “replica theory,” where easier, fake versions of valuable things (e.g., dating apps optimized for swipes, not deep matches) are often rewarded more than the real thing. This occurs when people are inarticulate about what they want or when verification is difficult.

10. Leverage AI for Value Clarity

Consider using AI tools to help make your deeper desires clearer to yourself and to make their satisfaction more verifiable. This can help move beyond superficial consumerism and political affiliations towards a more meaningful life.

8 Key Quotes

When we get down into people's values, they actually disagree a lot less than you'd expect, given how much disagreement we see ever in the world.
Joe Edelman

Part of the reason why we want people to be honest is for coordination reasons, stuff like that. But part of it is because there's a way of living honestly and a way of relating honestly that we like better, right? That we personally endorse in our own lives.
Joe Edelman

The pro choice movement has, you know, my body, my choice, right? This is a slogan designed to be contentious, right? To have people kind of up in arms about, you know, on one side or the other, or an even stronger one would be defund the police, right? Like this is designed to upset half of the people that the slogan, but there's, there's different ways of saying the same thing where people are like, oh, of course.
Joe Edelman

I think that we experience meaning when we're at the edge of our notion of the good life, like when we're encountering something maybe for the first time or it's been a while that is really important to us.
Joe Edelman

Markets are better at serving shallow desires than deep ones.
Joe Edelman

I think we've entered into a kind of a death spiral in a way where people are constantly promised community, connection, authenticity of different kinds, self-expression of different kinds... And then they get the non-customized product. They're still unfulfilled.
Joe Edelman

There's a bunch of areas where we're inarticulate about what we want and whether something satisfied it or not. And these are the areas where the fakes will occur.
Joe Edelman

AI and AGI and LLMs even specifically right now are both a big threat to society, but they're also a very scalable means to getting to what's really meaningful to us.
Joe Edelman

1 Protocols

Democratic Process for AI Value Alignment (Moral Graph)

Joe Edelman

Users choose one of three ambiguous situations where an LLM's response is unclear (e.g., young woman considering abortion, stressed parent, building a bomb).
Users answer how the LLM should respond, and a GPT-4 chatbot interrogates their answer to find underlying values and personal concerns.
The chatbot summarizes the user's concerns into one or more 'values cards,' which the user confirms capture their considerations.
Users review other people's values cards and rate whether they are as wise as their own.
Users are shown two values cards and a manufactured story of someone gaining wisdom by transitioning from one to the other, then asked if this transition seems plausible or wiser.
All collected information is compiled into a 'moral graph,' a visual data structure showing consensual 'wiser' value transitions for each context.

4 Key Numbers

~97%

Agreement on 'wiser' value transitions Found among Republicans and Democrats in America, based on a study with over 500 representative Americans, after applying specific thresholds (e.g., dropping edges if 10% disagree).

3%

Percentage of edges thrown out in moral graph construction Refers to the edges (A is wiser than B) in the moral graph that were discarded due to insufficient signal or disagreement, indicating 97% agreement on the remaining edges.

More than 500

Number of participants in the democratic process for value alignment Representative Americans (equal parts Republicans and Democrats) who participated in the process to collect values for the moral graph.

8 minutes

Time spent articulating first value in the democratic process Users spent this amount of time talking to the GPT-4 chatbot to articulate their initial value in the democratic process.

Deep Dive Analysis

Introduction to Value Alignment and Disagreement

Defining 'Values': Political, Normative, and Personal

Intrinsic Values vs. Instrumental Values

Three Strategies for Uncovering Value Convergence

Context's Influence on Value Importance

Democratic Process for AI Value Alignment (Moral Graph)

Applying Moral Graphs to Large Language Models

Example: Parenting Wisdom in Value Alignment

Quantifying Agreement in Value Alignment Research

Broader Applications of Value Alignment: Politics and Markets

Interpersonal Value Alignment and Mediation

The Relationship Between Meaning and Values

Critique of Market Failures in Meeting Deep Values

'Replica Theory' and Inarticulacy of Desire

Societal Gridlock from Lack of Shared Vision

AI's Role in Building a Values-Aligned Future

Values (Political Banner Terms)

Values (Social Norms)

Values (Sources of Meaning/Personal Values)

Intrinsic Values

Moral Graph

Reward Model

Replica Theory

1. Identify Core Personal Values

2. Rephrase Politically Divisive Values

3. Consider Context for Value Importance

4. Seek Wiser, Comprehensive Values

5. Mediate by Surfacing Core Values

6. Articulate Personal Values for Life

7. Cultivate Aesthetic Appreciation

8. Recognize Market Value Bias

9. Avoid Easy “Fake” Solutions

10. Leverage AI for Value Clarity

Democratic Process for AI Value Alignment (Moral Graph)