Should we trust papers published in top social science journals? (with Daniel Lakens)

Jul 24, 2024 Episode Page ↗

Overview

In this episode, Spencer Greenberg and Daniel Lackens discuss the craft of science, issues with peer review and research incentives, and the concept of red-teaming scientific research. They also delve into the trustworthiness of social science findings and the nuances of various psychological effects.

At a Glance

28 Insights

1h 41m Duration

17 Topics

8 Concepts

Deep Dive Analysis

17 Topic Outline

Trustworthiness of Social Science Research Today

Indicators of Trustworthy Research and Large Datasets

Psychology's Challenges: Theory, Precision, and Accumulation

Making Psychology a Cumulative and Collaborative Science

Improving Research Practices Through Constructive Criticism

Addressing Limitations of Traditional Peer Review

Understanding and Preventing 'Importance Hacking'

Reducing Pressure and Incentivizing Better Science

Coordination and Consensus in Scientific Fields

The Role of Red Teaming in Research

Effectiveness and Failures of Adversarial Collaborations

Teaching the Skill of Admitting Error in Science

Evolving Norms and Flexibility in Research Practices

Revisiting Facial Feedback, Power Posing, and Ego Depletion

Revisiting Terror Management Theory, Grit, and Growth Mindset

Challenges and Criticisms of the Implicit Association Test

Future of Social Science: Increased Collaboration

8 Key Concepts

P-hacking

A set of techniques, either explicit or implicit, where researchers make choices in data analysis (e.g., discarding outliers, trying multiple analyses) to achieve statistically significant results, often leading to false positives.

Importance Hacking

A phenomenon where research findings are replicable but their claimed value, significance, or real-world meaning is exaggerated. This can occur when large sample sizes yield statistically significant but negligible effects, or when messy results are presented as clear conclusions.

Red Teaming

A practice, borrowed from computer programming, where a dedicated team (the 'red team') actively attempts to criticize, break down, or find flaws in the work of another group (the 'blue team') in a collaborative effort to improve the final product.

Registered Reports

A publication format where a research proposal, including the introduction, methods, and analysis plan, is submitted for peer review and accepted *before* any data is collected. This allows for early criticism and refinement, preventing issues before they become entrenched.

Adversarial Collaborations

A research approach where scientists with strongly differing viewpoints on a topic agree to work together on a single paper. They jointly design the methodology, conduct the study, and co-publish the results, aiming to resolve disagreements or clarify points of contention in a constructive manner.

Stroop Effect

A classic psychological phenomenon where individuals experience interference when asked to name the color of a word that is itself a conflicting color name (e.g., the word 'red' printed in blue ink). It demonstrates the automaticity of reading and selective attention.

Ego Depletion

The theory that self-control or willpower is a limited mental resource that can be used up by demanding tasks, leading to a decrease in subsequent self-control efforts. While the general feeling of fatigue is real, the specific theoretical model involving a depletable 'glucose resource' was largely debunked.

Growth Mindset

The belief that one's abilities and intelligence are not fixed but can be developed and improved through dedication, hard work, and learning from challenges. This contrasts with a fixed mindset, where abilities are seen as inherent and unchangeable.

16 Questions Answered

How much should we trust new social science papers in top journals today?

The trustworthiness of new social science papers depends greatly on the specific paper, as practices have improved over the last decade, but some older, less rigorous practices still persist.

What indicators suggest a new research paper is trustworthy?

To assess trustworthiness, look for clear, falsifiable hypotheses, a well-defined data analysis plan, a sufficient amount of data for accurate estimates, and a solid, well-worked-out theoretical framework.

Do large datasets effectively prevent p-hacking?

Not necessarily; while large datasets lead to higher accuracy, if researchers have significant flexibility in their statistical models, dependent variables, or covariates, they can still engage in p-hacking.

Why doesn't psychology seem to be building grand unified theories like physics?

Many fields in psychology focus more on individual 'effects' rather than accumulating knowledge into overarching theories, partly because fixing research methods has been prioritized as an easier task than fixing theories.

Is it feasible to refute theories in psychology, given their often verbal nature and expected exceptions?

While more difficult than in physics, it is feasible to refute psychological theories, especially if they are more precisely formulated mathematically and if researchers actively test competing theories against each other.

How can psychology become a more cumulative science?

Making psychology cumulative requires increased collaboration across research fields and interdisciplinary research, as integration happens across areas, but this process is currently labor-intensive and not sufficiently rewarded in academia.

How can individual researchers improve their craft and avoid common pitfalls in social psychology?

Researchers should focus on actively seeking and incorporating decent criticism on their ideas, creating an atmosphere where people feel comfortable criticizing each other, and being critical of their own work.

What are the main limitations of the traditional peer review system?

Peer review is limited by the expertise of the reviewers, who may miss subtle issues or specialized flaws, and it often occurs too late in the research cycle, making authors defensive about already completed work.

What is 'importance hacking' in scientific research?

Importance hacking describes when a research finding is replicable, but its claimed value or significance is exaggerated; for example, a negligible effect found with a very large sample size is presented as a major innovation.

How can academia reduce the pressure on researchers to publish a high volume of papers?

Academia can reduce pressure by changing tenure criteria, such as granting tenure much earlier (e.g., after one year), which allows researchers more freedom to pursue difficult and important work rather than focusing on publication quantity.

Why do adversarial collaborations, despite their potential, often fail in practice?

Adversarial collaborations often fail because it is challenging for top experts, who have dedicated decades to certain theories, to reach agreement, especially if there is a lack of good faith or willingness to genuinely change their viewpoints.

Is the admission of error a skill that can be taught and learned by scientists?

Yes, admitting error is a skill that should be explicitly taught and trained among scientists, as it is difficult for everyone, and the academic environment often does not provide support for dealing with criticism or mistakes.

How can students be taught that p-hacking is problematic without causing them to over-correct into a failure to explore their problem space?

Students should be taught the basic principles underlying why certain practices are right or wrong, rather than just strict rules, to understand when flexibility (like sequential analysis) is statistically valid and when it constitutes p-hacking.

Is the facial feedback hypothesis a real effect?

The idea that intentionally smiling can have some effect on mood is likely true, but the mechanism might be more conscious (a 'demand effect') rather than an implicit biological process, and some original claims may not replicate.

What is the current scientific consensus on ego depletion?

While the feeling of being tired and giving in to impulses is universally understood, the specific theoretical model of ego depletion involving a depletable 'glucose resource' in the brain has been largely debunked, despite nearly 200 prior studies supporting it.

What are the main criticisms and current understanding of the Implicit Association Test (IAT)?

The IAT is methodologically interesting for measuring how people group things, but it faces significant criticism regarding what it truly measures, its susceptibility to various confounds, and its low test-retest reliability, making it difficult to interpret as a measure of deep, implicit bias.

28 Actionable Insights

1. Cultivate a Culture of Criticism

Actively seek and welcome criticism on your ideas from others, and foster an environment where people feel comfortable providing constructive feedback. This helps identify flaws and improve research, as self-criticism is often insufficient.

2. Implement Early-Stage Red Teaming

Shift the criticism process, like red teaming or peer review, to the early stages of research (e.g., via registered reports) before data collection. This allows researchers to fix fatal flaws in proposals and methods when it still matters, reducing defensiveness and waste.

3. Reduce Publication Pressure for Tenure

Implement tenure systems that grant early tenure (e.g., after one year) to assistant professors, significantly reducing the pressure to publish frequently. This frees researchers to pursue more difficult and important, rather than merely numerous, projects.

4. Increase Scientific Coordination, Collaboration

Actively foster coordination and collaboration among scientists to collectively address research challenges and improve practices. This is crucial for tackling large, complex problems and building cumulative knowledge.

5. Coordinate Replications, Measures, Challenges

Social scientists should coordinate to identify studies needing replication, standardize measurement tools, and commit to long-term, difficult research questions that require collective effort. This ensures a robust knowledge base and addresses critical, complex problems.

6. Promote Interdisciplinary Collaboration

Engage in more cross-field and interdisciplinary research, as integrating diverse expertise (e.g., sociologists, economists) is crucial for addressing complex, large-scale problems and building overarching theories. The academic system should reward this labor-intensive process.

7. Train Scientists to Admit Mistakes

Actively train scientists, especially early in their careers, on how to deal with criticism and admit when they are wrong or have made mistakes. This is a crucial skill that is often overlooked in academic training.

8. Broaden PhD Training to Practical Skills

Expand PhD programs to include training on practical skills essential for a scientific career, such as dealing with research roadblocks, developing new ideas, and managing criticism. This addresses common challenges students face but are not formally taught.

9. Discuss Principles vs. Rewards

Openly discuss the conflict between scientific principles and career reward structures with junior researchers. Encourage them to prioritize doing the right thing, even if it means a lower publication rate, as this aligns with long-term integrity and personal satisfaction.

10. Appoint a Chief Criticizer

For each research project, assign a dedicated “chief criticizer” who is responsible for identifying flaws and takes the blame if errors are found post-publication. This creates a strong incentive for thorough criticism and overcomes social biases.

11. Implement Red Teaming

Form a “red team” specifically tasked with actively trying to break down or criticize the work of another group (the “blue team”) in a collaborative environment. This method, borrowed from programming, helps identify weaknesses and improve outcomes.

12. Evaluate Research Trustworthiness

When assessing a new paper, check for falsifiable hypotheses, a clear data analysis plan, and sufficient data for accurate estimates, alongside a solid theoretical framework. This helps determine the reliability and validity of the findings.

13. Strengthen Theory to Prevent P-Hacking

Develop stronger, more constrained theoretical predictions in research, as this limits flexibility in data analysis and makes p-hacking more difficult. This theoretical component is often important to consider.

Share research code publicly to make mistakes visible and normalize the process of finding and fixing errors. This transparency can benefit the entire field by fostering a more open and accountable environment.

15. Sign Peer Reviews for Accountability

Voluntarily sign your peer reviews to foster a sense of personal responsibility for the quality of the critique. This can motivate reviewers to be more thorough and improve the overall peer review system.

16. Diversify Reviewer Expertise

Select peer reviewers with varied and diverse expertise, including those from outside the immediate sub-field, to provide more comprehensive and useful input. This can help catch mistakes that specialized reviewers might miss.

17. Prioritize Direct Dialogue for Disagreement

Instead of formal commentary articles in journals, resolve scientific disagreements through direct, in-person conversations. This informal setting can foster more productive and less defensive conflict resolution.

18. Engage in Adversarial Collaborations

When strong disagreements exist, scientists should engage in adversarial collaborations where they jointly design and execute studies to resolve conflicts. This method aims to reach a shared conclusion or clearly delineate remaining disagreements.

19. Utilize Flexible Pre-registration Methods

Learn and apply advanced pre-registration methods that allow for flexibility in data analysis while maintaining rigor. This addresses the common issue of unforeseen data outcomes and reduces the need to deviate from pre-specified plans.

20. Combine Exploratory and Confirmatory Analysis

Conduct exploratory analysis on a subset of data, then test the most promising hypotheses on a separate, “held-in-vault” confirmatory dataset. This allows for broad exploration while maintaining statistical rigor.

21. Measure Multiple Outcomes with Caution

Collect data on multiple outcome measures to gain a comprehensive understanding, but interpret findings cautiously, especially when only some measures show significance. Use such discrepancies as opportunities for deeper exploration rather than selective reporting.

22. Understand Underlying Principles of Methods

Focus on understanding the fundamental principles behind research methods and ethical practices, rather than mindlessly following rules or avoiding practices that merely “look like” problematic ones. This enables informed decision-making and appropriate flexibility.

Actively share null results to combat the “file drawer problem,” which otherwise leads to a literature filled with false positives and flukes. This transparency is crucial for an accurate scientific record.

24. Guard Against Importance Hacking

Be critical of research findings that replicate but whose significance or value is overstated or misinterpreted. Ensure that the claimed meaning and importance of results are genuinely supported by the data, not just statistical significance.

25. Adopt a Growth Mindset

View your own performance and abilities as something that can improve over time with effort and learning, rather than fixed traits. This perspective is crucial for continuous development and resilience, especially in challenging fields like science.

26. Continuously Teach Growth Mindset

Integrate the teaching of a growth mindset into education and training programs, reinforcing it consistently rather than as a one-off intervention. This sustained approach can lead to more significant and lasting positive effects.

27. Encourage Specialization in Science

Promote greater specialization within scientific fields, acknowledging that expertise in areas like programming, measurement, or statistics requires dedicated training and time. This can lead to higher quality work and fewer mistakes.

28. Interpret IAT with Caution

Be highly cautious and critical when interpreting results from the Implicit Association Test (IAT), acknowledging its methodological complexities and potential confounds. Do not assume it directly measures deep-seated implicit biases like racism without clear communication of its limitations.

6 Key Quotes

Studying physics is child play, but studying child play, on the other hand.
Daniel Lakens (attributing Einstein)

Criticizing somebody can be a great act of support. Because if somebody is going in the wrong path, you need to point it out.
Daniel Lakens

We create an environment where people have to look smart all the time.
Daniel Lakens

We have currently built in criticism in a very unfortunate point in the research cycle, namely where somebody already collected all the data, they wrote up their entire paper, then we send it off to some journal and the journal finds peer reviewers. Well, if you would think about how to best organize this system, this is not the moment to criticize people.
Daniel Lakens

The ego depletion effect is also an extremely important research finding for meta scientific reasons, not for the finding it was supposed to be. It tells us that we can have almost 200 studies in the scientific literature, that are part of a meta analysis. And we thought this is great, large effect, super relevant. And it turns out it was all nothing.
Daniel Lakens

I think the progress typically goes in one direction. So you rarely see a society fall back to a system where there's widespread corruption... And similarly, I don't really see us slip back into a system where people are p-hacking because the awareness is just there and you just feel bad about doing it now, whereas before you didn't realize.
Daniel Lakens

2 Protocols

Red Teaming Research

Daniel Lakens

Create a 'blue team' responsible for developing the research or software.
Form a dedicated 'red team' whose role is to actively criticize and attempt to break down the work of the blue team.
Ensure the red team operates in a collaborative environment, with the goal of improving the blue team's final product.
Staff the red team with diverse expertise, including perspectives not typically found in standard peer review.
Implement the red teaming process much earlier in the research cycle, ideally before data collection, to prevent mistakes when they are easier and less costly to fix.

Improving Science Through Consensus Meetings

Daniel Lakens

Convene experts from a specific scientific field (e.g., once a decade, as in physics).
Discuss and establish consensus on what is currently known and what remains unknown within the field.
Identify critical research questions or tasks that require collective effort, such as specific replication studies, the development of standardized measurement tools, or long-term, difficult research projects.
Coordinate efforts by assigning different groups or regions to specific tasks, leveraging shared resources.
Actively build in and coordinate for a modest amount of desired variability and disagreement, rather than solely striving for complete consensus.

5 Key Numbers

Number of universities in the Netherlands Approximate number of universities, relevant to the country's labor agreement in academia.

Papers published annually by Psychological Science in the Public Interest This journal specifically invites and organizes adversarial collaborations.

Almost 200

Number of studies on ego depletion in a meta-analysis These studies initially suggested a large effect, but later replication attempts, including by original authors, largely failed, highlighting a significant issue of research waste and false positives.

More than 15 years

Time since Daniel Lakens completed his PhD Provides context for his perspective on the evolution of scientific norms and practices.

95%

Percentage of time Daniel Lakens is happy with peer reviews he receives His personal experience as a researcher submitting papers for peer review.

Deep Dive Analysis

Trustworthiness of Social Science Research Today

Indicators of Trustworthy Research and Large Datasets

Psychology's Challenges: Theory, Precision, and Accumulation

Making Psychology a Cumulative and Collaborative Science

Improving Research Practices Through Constructive Criticism

Addressing Limitations of Traditional Peer Review

Understanding and Preventing 'Importance Hacking'

Reducing Pressure and Incentivizing Better Science

Coordination and Consensus in Scientific Fields

The Role of Red Teaming in Research

Effectiveness and Failures of Adversarial Collaborations

Teaching the Skill of Admitting Error in Science

Evolving Norms and Flexibility in Research Practices

Revisiting Facial Feedback, Power Posing, and Ego Depletion

Revisiting Terror Management Theory, Grit, and Growth Mindset

Challenges and Criticisms of the Implicit Association Test

Future of Social Science: Increased Collaboration

P-hacking

Importance Hacking

Red Teaming

Registered Reports

Adversarial Collaborations

Stroop Effect

Ego Depletion

Growth Mindset

1. Cultivate a Culture of Criticism

2. Implement Early-Stage Red Teaming

3. Reduce Publication Pressure for Tenure

4. Increase Scientific Coordination, Collaboration

5. Coordinate Replications, Measures, Challenges

6. Promote Interdisciplinary Collaboration

7. Train Scientists to Admit Mistakes

8. Broaden PhD Training to Practical Skills

9. Discuss Principles vs. Rewards

10. Appoint a Chief Criticizer

11. Implement Red Teaming

12. Evaluate Research Trustworthiness

13. Strengthen Theory to Prevent P-Hacking

14. Increase Transparency by Sharing Code

15. Sign Peer Reviews for Accountability

16. Diversify Reviewer Expertise

17. Prioritize Direct Dialogue for Disagreement

18. Engage in Adversarial Collaborations

19. Utilize Flexible Pre-registration Methods

20. Combine Exploratory and Confirmatory Analysis

21. Measure Multiple Outcomes with Caution

22. Understand Underlying Principles of Methods

23. Share Null Results to Prevent Bias

24. Guard Against Importance Hacking

25. Adopt a Growth Mindset

26. Continuously Teach Growth Mindset

27. Encourage Specialization in Science

28. Interpret IAT with Caution

Red Teaming Research

Improving Science Through Consensus Meetings