Beyond cognitive biases: improving judgment by reducing noise (with Daniel Kahneman)

Sep 23, 2021 Episode Page ↗
Overview

Daniel Kahneman discusses bias and noise in decision-making, highlighting how human judgment is often noisy. He explores the superiority of algorithms over human judgment in many domains and the inherent limits of predictability.

At a Glance
21 Insights
1h 15m Duration
20 Topics
13 Concepts

Deep Dive Analysis

Applying Measurement Accuracy to Human Judgments

Understanding Bias and Noise in Judgment Error

How Cognitive Biases Influence Both Bias and Noise

Different Types of Noise: Occasional, Level, and Pattern

The Archery Metaphor for Visualizing Error Components

Measuring Noise with Noise Audits: An Insurance Example

Algorithms vs. Human Judgment: Superiority and Limitations

The Unfairness of Noise and Systematized Bias in Algorithms

Limits of Predictability: Insights from the Fragile Family Study

Decision Hygiene: General Strategies for Reducing Judgment Error

Mediating Assessment Protocol for Structured Decision-Making

Challenges of Group Judgments and Polarization Effects

Why Humans Focus More on Causal Bias Than Statistical Noise

Hindsight Bias and the Illusion of Predictability

Improving Individual Judgment Through the 'Crowd Within'

Individual vs. Organizational Improvement in Judgment

The Replication Crisis in Psychology and Improving Research Standards

Generating and Recognizing Influential Ideas

Reconciling Human Cognition with Rationality Definitions

Bayesianism as a Framework for Correct Judgment

Judgment as Measurement

Human judgments, such as making a diagnosis or sentencing someone, can be viewed as an operation of measurement where a human mind assigns a value to an object on a scale, similar to how physical instruments measure length or weight.

Bias (in measurement error)

In the context of judgment, bias refers to the average error across multiple measurements or judgments. For example, if a group of forecasters consistently overestimates inflation, the average amount they are off represents their collective bias.

Noise (in measurement error)

Noise refers to the variability of errors in judgments, meaning the extent to which different measurements or judgments of the same object differ from each other. It represents the scatter or inconsistency in judgments, even if the average judgment is accurate.

Total Error Formula

The overall error in a set of measures or judgments can be quantified by a simple formula: Total Error = (Bias)² + (Noise)². This formula highlights that both bias and noise are distinct but equally important components of total error, and reducing either can improve accuracy.

Occasional Noise

This type of noise refers to variability within a single judge, where the same individual might give different responses to the same object or situation on different occasions. It can be influenced by transient factors like mood, time of day, or temperature.

Level Noise

Level noise describes the overall difference in severity or tendency across different judges. For instance, some judges might be generally more severe in sentencing than others, or some physicians might detect more heart problems than others, even for identical cases.

Pattern Noise

This is the most significant source of noise, referring to differential responses to different kinds of things across judges. It means different people look at the same situation and see it in surprisingly different ways, emphasizing different aspects or combining information differently based on their unique perspectives.

Noise Audit

A noise audit is a process to measure the amount of noise in an organization's judgments without needing to know the 'truth' or correct answer. By comparing how much different individuals in the same role disagree on identical cases, an organization can quantify and work to reduce noise.

Algorithms vs. Human Judgment

Algorithms, especially those based on machine learning, are often superior to human judgment in many domains because they are noise-free and can process vast amounts of data to find patterns. Unlike humans, an algorithm will always give the same answer when presented with the same information.

Decision Hygiene

Decision hygiene refers to a set of general procedures or practices designed to reduce errors in judgment without necessarily knowing the specific errors being avoided. It's analogous to washing hands to prevent unknown germs, aiming for a general improvement in judgment quality.

Mediating Assessment Protocol

This protocol involves breaking down a complex judgment problem into its constituent attributes, evaluating each attribute separately and independently, and delaying the global judgment until all individual attribute assessments are complete. This structured approach helps prevent early impressions from dominating the decision.

Crowd Within

The 'crowd within' refers to the idea that an individual can improve their own judgments by eliciting multiple judgments from themselves on different occasions or by considering alternative perspectives. Averaging these internal judgments can statistically reduce noise, similar to averaging judgments from multiple people.

Causal Thinking vs. Statistical Thinking

Humans are naturally inclined to engage in causal thinking, building stories about individual cases to understand the world, which makes bias (a perceived cause) more salient. In contrast, noise is a statistical property that requires thinking about collections of cases, which is less intuitively grasped and therefore often neglected.

?
How can the theory of measurement accuracy be applied to human judgments?

Human judgments can be treated as measurements, where the human mind acts as the measuring instrument. The accuracy of these judgments can then be analyzed in terms of bias (average error) and noise (variability of errors), just like in physical sciences.

?
How do cognitive biases influence judgment error?

Cognitive biases can influence both the bias term and the noise term in judgment error. For example, individual judges may have their own biases (e.g., some are generally more severe), and the variability of these individual biases across different judges contributes to overall noise.

?
What are the different types of noise in human judgment?

Noise can be categorized into occasional noise (variability within a single judge over time, influenced by transient factors like mood or temperature) and noise across individuals, which includes level noise (overall differences in severity among judges) and pattern noise (differential responses to specific types of cases among judges).

?
Can noise be measured even if the 'truth' or correct answer is unknown?

Yes, noise is easier to measure than bias because it focuses on the variability among judgments rather than their deviation from a known truth. Organizations can conduct 'noise audits' by comparing how much different people in the same role disagree on identical cases to quantify noise.

?
How do algorithms compare to human judgment in decision-making?

Algorithms are often superior to human judgment because they are noise-free, meaning they consistently produce the same output for the same input. This consistency, combined with the ability to process vast datasets, allows them to achieve higher accuracy than noisy human judgments, even if humans are not inherently 'bad' at judging.

?
How can human biases become 'baked into' machine decisions, and can they be compensated for?

If algorithms are trained on human decisions or data that reflects human biases (e.g., biased arrest records), those biases can be systematized into the algorithm. While this is a concern, such biases are often detectable and, in principle, controllable through critical program design and evaluation.

?
Are there limits to how well algorithms can predict life outcomes?

Yes, algorithms cannot exceed the inherent limits of predictability in an environment. For complex life events, much depends on unpredictable, real-time factors that are not present in the initial data, meaning even sophisticated algorithms with vast data can only achieve limited accuracy.

?
What does 'decision hygiene' entail for improving judgment?

Decision hygiene involves adopting general, non-specific procedures to reduce errors in judgment, much like washing hands prevents unknown germs. It focuses on disciplined thinking and structured approaches to decision-making, such as evaluating attributes independently before making a global judgment.

?
Why do people tend to focus more on bias than noise when trying to reduce error?

People's minds are better at dealing with causes than statistics. Bias can be perceived as a causal factor pushing a judgment one way, aligning with our natural causal thinking. Noise, being a statistical property that requires observing variability across multiple cases, is less intuitively grasped and therefore often neglected.

?
Can individuals significantly improve their own decision-making abilities?

Daniel Kahneman is generally skeptical of individuals greatly improving their judgment merely by knowing about biases and noise, noting his own experience. However, adopting decision hygiene principles and self-discipline for important decisions can lead to some improvement, and organizations have more hope for systemic improvements.

?
How can one recognize a good idea when they have it?

Good ideas often occur spontaneously, not through deliberate searching. The key is to recognize them as a 'glimmer of something new,' even if not fully understood initially. Influential ideas often have a common-sense character that existing disciplines might have overlooked, requiring honing and rigorous testing to make them scientifically useful.

?
Are humans irrational, according to Daniel Kahneman's work?

No, Daniel Kahneman views 'irrationality' as a technical term for failing to meet the logic of decision theory or probability, which is not feasible for anyone. His work shows that people are not 'fully rational' in this technical sense, but are mostly reasonable, and their errors are predictable and systematic rather than random.

1. Implement Mediating Assessment Protocol

Apply the mediating assessment protocol by defining important attributes for a decision, evaluating each attribute independently, and delaying the global judgment until all attributes have been assessed. This structured approach helps avoid premature conclusions and gathers more valuable information.

2. Employ Algorithms for Judgment

Use algorithms, even simple rules, for making judgments, especially in data-rich environments, because they are noise-free and consistently apply rules, often leading to more accurate outcomes than human judges.

3. Implement Noise Audits

Conduct noise audits by having multiple individuals in the same role make judgments on identical cases and then comparing their variability. This helps quantify and reduce noise in organizational decisions, even without knowing the true “bullseye.”

4. Average Independent Judgments

Gather independent judgments from multiple people on the same issue and average them to significantly reduce noise and improve accuracy, even if individual biases persist. This leverages the “wisdom of the crowd” effect to eliminate noise.

5. Maintain Independent Group Judgments

To maximize the benefits of group judgment, ensure individuals form and record their opinions independently before any group discussion. This prevents judgments from contaminating each other, which can otherwise reduce the noise-reduction effect and even lead to polarization.

6. Average Your Own Judgments

Improve your personal judgments by consciously generating multiple perspectives or judgments on a single issue, such as through a pre-mortem exercise, and then averaging them. This “crowd within” approach can reduce individual noise and improve accuracy.

7. Cultivate Disciplined Thinking

Adopt disciplined thinking by consciously applying decision hygiene rules and principles to important decisions, especially when you suspect you might be making a mistake. This structured approach can make your careful thinking more effective.

8. Apply Reference Class Forecasting

When making predictions, view the current situation as an instance of a broader class of similar past events, a technique called reference class forecasting. This method tends to reduce both noise and bias in your judgments.

9. Prioritize Organizational Improvement

Focus efforts on improving judgment and decision procedures within organizations, as this approach holds more promise for significant and lasting change than solely attempting to de-bias individuals.

10. Frame Judgment as Measurement

Conceive of judgment as a measurement operation, assigning a value on a scale, to apply the theory of measurement accuracy, which characterizes error in terms of bias and noise. This reframing helps in understanding and addressing errors in human judgment.

11. Acknowledge Noise in Judgment

Actively recognize and account for noise—the variability of errors—in judgments, as it is often neglected compared to bias, yet contributes equally to total error. Understanding noise helps identify inconsistencies in decisions.

12. Inspect Algorithms for Bias

Actively inspect the design and training data of algorithms to identify and mitigate potential biases, such as those stemming from disproportionate data for certain groups or biased input measures. These errors are detectable in principle and can be controlled.

13. Consider Algorithmic Caveats

Be cautious about using algorithms in rapidly changing environments where training data may not reflect new conditions, when humans possess critical, unquantifiable information, or when there’s a risk of feedback loops biasing training data. These situations may reduce algorithmic effectiveness.

14. Acknowledge Predictability Limits

Recognize that there are inherent limits to how predictable environments and life events are, even with advanced algorithms and extensive data. This understanding helps manage expectations and prevents overconfidence in forecasts.

15. Improve Research Practices

To enhance the replicability and scientific quality of social science research, adopt practices like using larger samples, pre-registering statistical analyses, and comprehensively reporting all procedural details. These measures help reduce self-deception and improve scientific rigor.

16. Recognize and Develop Ideas

Cultivate the ability to recognize when a “good idea” or a “glimmer of something new” spontaneously occurs, as this recognition and subsequent development are crucial for impactful work, rather than just deliberate searching.

17. Observe for Common Sense Insights

Cultivate keen observation of people, the world, and yourself, as many influential ideas stem from simple, common-sense insights that existing theories or disciplines may have overlooked. These insights can then be honed into scientific contributions.

18. Refine Ideas with Rigor

Transform common-sense ideas into precise, rigorous concepts, then experimentally test them and build a supporting theory. This process makes the ideas scientifically useful, especially when they challenge or expand existing knowledge.

19. Characterize Thinking Constructively

Approach the study of human judgment by constructively characterizing how people think, recognizing that errors are often predictable and systematic, rather than simply labeling behavior as “irrational.” This allows for a more productive understanding of cognition.

20. Apply Bayesian Evidence Framework

Utilize Bayesianism as a normative framework for evaluating evidence by considering how much more likely observed evidence is if a hypothesis is true compared to if it’s not. This provides a clear rule set for updating beliefs in light of new information.

21. Be Skeptical of Self-Improvement

Be realistic about the limited ability of individuals to greatly improve their own judgment simply by knowing about biases and noise, as personal experience suggests it’s difficult. Focus efforts more on improving organizational decision procedures, which have greater potential for impact.

The average of these areas of bias. So you can overestimate length or you can underestimate length. That's the average area that's a bias. The variability of errors is noise.

Daniel Kahneman

The measure of error, global measure of error, equal the square of the bias plus the square of the noise.

Daniel Kahneman

When different people look at the same situation, they don't see it in the same way.

Daniel Kahneman

It's been known for 70 years that when you pit individuals making a judgment against very simple rules to combine information mechanically that is available to the judge, the simple rules typically do as well, and in about 50% of the cases, they do better than individuals.

Daniel Kahneman

The reason is straightforward. There's only one possible reason why a model of the judge, that is, you statistically create a model of the judge which predicts what the judge would say. But actually, the judge would not say that every time. The judge is noisy. The model is not. And the superiority of the model in terms of its accuracy is for a simple reason. It's noise free.

Daniel Kahneman

The real culprit, I think, in the replication crisis was not fraud. It was self-deception.

Daniel Kahneman

I don't think you deliberately search for good ideas. You know, that would be a very productive search. Good ideas occur to you. And then what's important is to recognize them, is to recognize that you've had a good idea and to recognize that there may be a glimmer of something new.

Daniel Kahneman

One of my least favorite words is the word irrationality. I've never used it. I view rationality as a technical definition of the logic of decision-making or the logic of probability, which people's intuitions simply cannot conform to. It's not feasible. Full rationality as defined in decision theory is not feasible.

Daniel Kahneman

Mediating Assessment Protocol (Structured Interview Approach)

Daniel Kahneman
  1. Define the attributes that are important to performance or desirability for the judgment.
  2. Evaluate each of these attributes one at a time, and to the extent possible, independently of each other.
  3. Delay the global judgment or intuition until all separate judgments of the attributes have been made.
208
Federal judges in a study Participated in a study evaluating 16 crime cases for sentencing.
16
Crime cases evaluated by judges Specified with enough detail for judges to set an appropriate sentence.
4 years
Expected difference in sentence between two random judges For a crime where the average sentence was 7 years, illustrating huge variability (noise).
10%
Expected difference in underwriting premium judgments (executives' estimate) What insurance company executives anticipated for two random underwriters evaluating the same complex risk.
52%
Actual difference in underwriting premium judgments (measured) The true variability found among underwriters in a well-run insurance company, five times higher than anticipated.
10%
Reduction in noise by averaging 100 judgments Averaging 100 people's judgments reduces noise to 10% of what it was for individual judgments.
55-57%
Prediction accuracy in the Fragile Family Study (above chance) The maximum accuracy achieved by computer scientists and researchers trying to predict life events from sociological data, where chance would be 50%.
160
Experts trying to generate prediction rules in Fragile Family Study The number of experts involved in trying to find patterns and predict life events from the large dataset.
One third as good
Effectiveness of 'crowd within' compared to asking another person Averaging one's own judgments across time can improve accuracy, estimated to be about one-third as effective as asking a different person.