Career science, open science, and inspired science (with Alexa Tullett)

Sep 14, 2022 Episode Page ↗
Overview

Spencer Greenberg and Alexa Tullet discuss trust in science, comparing meta-analyses and registered reports, and introducing "importance laundering." They explore how to improve scientific practices, distinguishing between "open" and "inspired" science, and touch on topics like the justice system and college admissions.

At a Glance
30 Insights
1h 20m Duration
20 Topics
8 Concepts

Deep Dive Analysis

Trusting Science: Personal Frustrations and Challenges

Meta-Analyses vs. Registered Reports: Trustworthiness and Bias

Conceptual Replications and Heterogeneity in Research

The 'Great Reset' in Social Science and Skepticism

Open Science vs. Inspired Science: Motivations and Goals

The Role of Peer Persuasion and Self-Deception in Science

P-Values, Thresholds, and Truth-Seeking vs. Publication

Exploratory vs. Confirmatory Research in Science

Introduction to 'Importance Laundering' in Scientific Literature

Subtype 1: Conclusion Hacking and Causal Inference Problems

Subtype 2: Novelty Hacking and Redefining Constructs

Subtype 3: Usefulness Hacking and Small Effect Sizes

Subtype 4: Beauty Hacking and Storytelling in Research

Replicability vs. Generalizability in Social Science

Should Retribution Be Part of the Justice System?

Are We Asking Too Much of the Supreme Court?

Ideal College Admissions Process Considerations

Personal Views on Pets and Emotional Connection

Lessons Learned from Podcasting Interviews

Excitement for the Scientific Method and its Potential

Meta-analysis

A statistical analysis that combines the results of multiple scientific studies. While they deal with more data, they are susceptible to publication bias because they often only include published studies, which tend to favor positive or significant findings.

Registered Report

A type of scientific publication where the methodology and research questions are peer-reviewed and accepted by a journal *before* the study is conducted and results are known. This commitment to publish regardless of outcome helps prevent publication bias and p-hacking, increasing trustworthiness.

Conceptual Replication

A study that attempts to replicate the core idea or construct of an original study but changes aspects of the methodology, such as using different measures or interventions. While necessary for generalizability, they can be problematic if used to test the strength of an original study, as inconsistencies can be attributed to methodological changes rather than a weak original effect.

Open Science

A movement in science advocating for practices like pre-registration, open data, open materials, and open access publishing, primarily as a reaction to the replication crisis. Its focus is on making research defensible and transparent, ensuring accountability and reducing bias by adhering to rigorous standards.

Inspired Science

An approach to science focused purely on discovering the truth about how things work in reality, without primary concern for proving findings to others or adhering to external conventions. It emphasizes rapid, iterative exploration to understand phenomena, with confirmatory steps serving to validate internal understanding rather than being the sole focus of discovery.

P-Hacking

The practice of selectively analyzing or reporting data in order to achieve a statistically significant p-value (typically less than 0.05), often by trying multiple analyses until a desired result is found. This compromises the integrity of results by capitalizing on chance and can lead to false positives.

Importance Laundering

A phenomenon where a replicable scientific finding is made to seem more interesting or important than it actually is, allowing it to be published in top journals despite its limited practical or theoretical value. This can involve various deceptive tactics in how results are presented or interpreted.

Ego Depletion

The idea that self-control or willpower is a limited resource that gets 'used up' after exertion, making it harder to exercise self-control subsequently. Critics argue that in some studies, it's hard to distinguish this effect from simple tiredness, and the effect becomes harder to find when more rigorously distinguished from fatigue.

?
How much should we actually trust science?

It's surprisingly difficult to find clear, trustworthy answers to many empirical questions, even for scientists. While resources like the Cochrane Collaboration and Google Scholar can help, many areas lack sufficient evidence or present contradictory findings, making it hard to feel confident in scientific answers.

?
Are registered reports more trustworthy than meta-analyses?

Yes, registered reports are generally considered more trustworthy because they prevent publication bias by committing to publish results regardless of outcome, and they require authors to stick to their original plan, reducing the likelihood of false positives. Meta-analyses, while combining more data, are often biased by the selective publication of positive results.

?
What is the difference between 'open science' and 'inspired science'?

Open science focuses on making research defensible and unimpeachable through practices like pre-registration and open data, primarily as a reaction to flaws in traditional science. Inspired science, in contrast, is about the fundamental pursuit of truth and understanding how reality works, using rigorous methods to discover, with external validation being a secondary, later step.

?
Why is it so hard for individual scientists to be unbiased?

As social psychologists, it's understood that humans are constantly subject to motivated reasoning, biases, and preconceived notions. It's easy to convince oneself of outcomes that align with expectations or career incentives, making external accountability and peer persuasion critical for producing trustworthy findings.

?
Do p-value thresholds (like p < 0.05) represent black-and-white thinking?

Yes, dichotomizing p-values into 'significant' or 'not significant' using a fixed threshold can be seen as black-and-white thinking that throws away information. While useful for publication standards, a p-value should ideally be interpreted as a continuous form of evidence about whether a result is due to random noise, especially when seeking the truth.

?
What is 'importance laundering' in scientific literature?

Importance laundering refers to the practice of making a replicable but uninteresting or unimportant scientific finding appear significant enough to be published in top journals. This can involve 'conclusion hacking' (misrepresenting what was shown), 'novelty hacking' (making common sense seem novel), 'usefulness hacking' (downplaying small effect sizes), and 'beauty hacking' (oversimplifying complex results into an elegant story).

?
Is generalizability more important than replicability in science?

While replicability is crucial, generalizability is also extremely important. A study can be highly replicable but completely uninteresting if its findings don't generalize beyond the specific experimental setup or if the measures lack validity, meaning they don't tell us truths that matter about human behavior.

?
Should retribution be a goal of our justice system?

No, retribution should not be a goal of the justice system because it relies on assessing blameworthiness, which is essentially impossible to do accurately. Given the high cost of being wrong, the system should abandon retribution and focus on consequentialist goals like public safety, rather than punishing individuals in proportion to perceived deservingness.

?
Are we asking too much of the US Supreme Court?

Yes, we are likely asking too much. Similar to the skepticism about scientists' objectivity, it's highly doubtful that Supreme Court justices can truly be objective and keep their personal values or politics from influencing their decisions, despite this being an ostensible requirement of their powerful positions.

?
What would an ideal college admissions process look like?

An ideal college admissions process would depend on the purpose of education and who we aim to select. Instead of solely identifying those most likely to succeed by standard metrics (grades, tests), it could focus on identifying those who would benefit most from education, or even employ a system of random selection after a basic threshold of college readiness is met.

1. Acknowledge Self-Deception

Understand that as individuals, we are highly prone to self-deception and biases, making external accountability and peer persuasion critical for producing trustworthy results, even for ourselves.

2. Embrace Rapid Iterative Exploration

To truly understand a phenomenon, engage in rapid, iterative exploration by testing it from many different angles, viewing this as the core process of figuring out the truth before conducting confirmatory studies.

3. Use Confirmatory Studies to Verify

After extensive exploration, conduct pre-registered confirmatory studies to ensure you haven’t bullshitted yourself and to provide strong, unimpeachable evidence to others about your findings.

4. Distinguish Exploration from Confirmation

When conducting research, clearly differentiate between exploratory analyses, which generate new hypotheses, and confirmatory analyses, which test pre-specified hypotheses, to maintain scientific integrity.

5. Beware “Importance Laundering”

Be vigilant against “importance laundering,” a practice where research findings that are replicable but lack genuine interest or importance are presented in a way that makes them seem significant.

6. Identify Conclusion Hacking

Watch out for “conclusion hacking,” where a study presents a specific finding (X) but subtly implies a more interesting, unproven finding (X prime), often by using vague language or overclaiming.

7. Recognize Novelty Hacking

Be aware of “novelty hacking,” which involves presenting a result as novel even if it’s common sense or an already established fact, often by renaming or repackaging existing constructs.

8. Detect Usefulness Hacking

Look for “usefulness hacking,” where researchers highlight the statistical significance of a finding with a very small effect size, while downplaying its lack of practical or clinical importance.

9. Critique Beauty Hacking

Critically evaluate “beauty hacking,” a practice where messy or contradictory research results are simplified and presented as a clean, elegant, and exciting story, often by omitting inconvenient findings.

10. Prioritize Generalizability

Beyond replicability, prioritize generalizability in research, as findings that are reliable but lack broader applicability may not contribute to understanding truths about human behavior that truly matter.

11. Be Skeptical of Small Effects

Maintain general skepticism towards small effects, as they are more likely to be accidental findings due to subtle experimental mistakes and tend to be less reliable than larger effects.

12. Contextualize Small Effects

When encountering small effects, assess their importance based on context; they can be significant if they are paradigm-shifting, relate to life-or-death outcomes, or have extremely low implementation costs.

13. Prioritize Registered Reports

When seeking scientific answers, prioritize registered reports over meta-analyses because they have more safeguards against bias, ensuring results are published regardless of outcome and authors stick to their original plan.

14. Understand Meta-Analysis Limitations

Be aware that meta-analyses can suffer from publication bias, where only studies with significant effects are published, and heterogeneity, where diverse study designs are combined, potentially obscuring true effects.

15. Expect Inconclusive Meta-Analyses

Be prepared for most meta-analyses to conclude that “more evidence is needed,” as this is a common outcome, suggesting that definitive answers are often still elusive.

16. View P-Values as Continuous Evidence

When seeking the truth, avoid dichotomizing p-values into “significant” or “not significant” thresholds; instead, interpret them as continuous evidence against the null hypothesis, as this is the correct mathematical interpretation.

17. Justify Alpha Based on Costs

When setting a p-value cutoff (alpha), justify it by considering the relative costs of making a false positive (Type 1 error) versus a false negative (Type 2 error) in your specific research context.

18. Perform Power Calculations

Before conducting a study, perform a power calculation to determine the necessary sample size, ensuring the study is adequately powered to detect the effects you are looking for.

19. Pre-Register Studies

Pre-register your study’s introduction and methods section before data collection to prevent p-hacking and demonstrate that your research plan was established prior to knowing the results.

20. Show All Studies, Even Imperfect Ones

Instead of hiding initial or “crappy” studies, make them available (e.g., online or in supplementary materials) so that others can see the full research process and evaluate the sum of all evidence, preventing capitalization on chance.

21. Consult Cochrane for Health Questions

When researching medical or health questions, begin by checking the Cochrane Collaboration, which provides in-depth meta-analyses on various topics, offering a trustworthy starting point for evidence.

22. Google Scholar for Research

If the Cochrane Collaboration doesn’t cover your topic, use Google Scholar and include phrases like “randomized control trial” or “systematic review” in your search to find empirical evidence.

23. Abandon Retribution in Justice

Eliminate retribution as a goal in the justice system, as it relies on an impossible assessment of blameworthiness, and the high cost of error necessitates focusing on consequentialist goals like public safety instead.

24. Skepticism of Judicial Objectivity

Maintain skepticism regarding the objectivity of Supreme Court justices, recognizing that individuals are inherently biased, and the immense power of their positions, combined with an expectation of impartiality, is often problematic.

25. Define Educational Purpose First

Before designing a college admissions process, clearly define the fundamental purpose of education and the type of individuals you aim to select, as this will shape the entire selection strategy.

26. Admit for Benefit, Not Just Success

Explore college admissions models that prioritize identifying individuals who will benefit most from education, rather than solely selecting those predicted to achieve high grades or test scores.

27. Random Selection Post-Threshold

Implement a college admissions system where candidates must meet a basic threshold of readiness or potential benefit, after which selection is randomized to address the inherent difficulties of precise evaluation.

28. Listen Actively in Interviews

In interviews, practice active listening to identify truly important or interesting points made by the speaker, and be prepared to ask follow-up questions to delve deeper into those moments.

29. Record & Review Conversations

To enhance listening skills and self-awareness, record your conversations and listen back to them, noting points you missed or differences between your internal thoughts and what you actually verbalized.

30. Let Conversations Flow Organically

When facilitating discussions or interviews, resist the urge to heavily structure or control the conversation; instead, allow it to flow organically, as this often leads to more interesting and insightful exchanges.

I'm very, very cynical about our ability to be honest with ourselves.

Alexa Tullett

I think that the thing that is most motivating to me right now in social science is that while I think that it's very difficult to come to conclusive answers about important questions that we have, I do believe in the process that we're using.

Alexa Tullett

I think that the scientific method is actually so powerful, even though like our human implementation of it is often so flawed, but like if we can go back to basics and like leverage it properly, like there's actually tremendous potential.

Spencer Greenberg

Open science to me feels focused on making work defensible. In other words, like, you know, nobody can criticize my work because I like did all the right things.

Spencer Greenberg

The problem I have is, like, to me, that's not actually how you figure out the truth. The way you figure out the truth is, like, the fast, iterative, like, you're shining flashlights from lots of different angles, like, trying to understand this phenomenon from lots of different ways.

Spencer Greenberg

I don't really understand why people are so into dogs and stuff like that. And people have like a really negative reaction. They're like, wow, you are a completely different person than I thought you were. You might be a psychopath.

Alexa Tullett
2%
Likelihood of committing fraud in science Estimated percentage of people willing to commit fraud, indicating it's not common but can be a big problem when it occurs.
0.05
P-value threshold for statistical significance Common threshold used in science; results below this are often considered statistically significant and boost publication chances.
0.2
Example p-value for a non-significant result Hypothetical p-value for an analysis that might lead a researcher to try different analytical approaches.
0.06
Example p-value for a result just above significance A p-value that, while not meeting the 0.05 threshold, still provides some evidence against random noise and should be considered similar to 0.04 for truth-seeking.
0.04
Example p-value for a result just below significance A p-value that meets the 0.05 threshold but is essentially the same as 0.06 in terms of evidence against random noise.
0.001
Example p-value for a very strong result A p-value indicating a result is very unlikely to be due to sampling error.
15
Number of studies conducted before confident conclusions on gender and personality research Spencer's research team conducted this many studies to understand a phenomenon before feeling confident in their conclusions.
18 out of 18
Number of hypotheses confirmed in a pre-registered study Confirmed in a pre-registered confirmatory study after extensive exploratory work, demonstrating the robustness of findings once understood.
3%
Chance of not dying that is considered valuable Illustrates that even a small effect size can be highly significant if it pertains to life-and-death situations.