Should we trust papers published in top social science journals? (with Daniel Lakens)
In this episode, Spencer Greenberg and Daniel Lackens discuss the craft of science, issues with peer review and research incentives, and the concept of red-teaming scientific research. They also delve into the trustworthiness of social science findings and the nuances of various psychological effects.
Deep Dive Analysis
17 Topic Outline
Trustworthiness of Social Science Research Today
Indicators of Trustworthy Research and Large Datasets
Psychology's Challenges: Theory, Precision, and Accumulation
Making Psychology a Cumulative and Collaborative Science
Improving Research Practices Through Constructive Criticism
Addressing Limitations of Traditional Peer Review
Understanding and Preventing 'Importance Hacking'
Reducing Pressure and Incentivizing Better Science
Coordination and Consensus in Scientific Fields
The Role of Red Teaming in Research
Effectiveness and Failures of Adversarial Collaborations
Teaching the Skill of Admitting Error in Science
Evolving Norms and Flexibility in Research Practices
Revisiting Facial Feedback, Power Posing, and Ego Depletion
Revisiting Terror Management Theory, Grit, and Growth Mindset
Challenges and Criticisms of the Implicit Association Test
Future of Social Science: Increased Collaboration
8 Key Concepts
P-hacking
A set of techniques, either explicit or implicit, where researchers make choices in data analysis (e.g., discarding outliers, trying multiple analyses) to achieve statistically significant results, often leading to false positives.
Importance Hacking
A phenomenon where research findings are replicable but their claimed value, significance, or real-world meaning is exaggerated. This can occur when large sample sizes yield statistically significant but negligible effects, or when messy results are presented as clear conclusions.
Red Teaming
A practice, borrowed from computer programming, where a dedicated team (the 'red team') actively attempts to criticize, break down, or find flaws in the work of another group (the 'blue team') in a collaborative effort to improve the final product.
Registered Reports
A publication format where a research proposal, including the introduction, methods, and analysis plan, is submitted for peer review and accepted *before* any data is collected. This allows for early criticism and refinement, preventing issues before they become entrenched.
Adversarial Collaborations
A research approach where scientists with strongly differing viewpoints on a topic agree to work together on a single paper. They jointly design the methodology, conduct the study, and co-publish the results, aiming to resolve disagreements or clarify points of contention in a constructive manner.
Stroop Effect
A classic psychological phenomenon where individuals experience interference when asked to name the color of a word that is itself a conflicting color name (e.g., the word 'red' printed in blue ink). It demonstrates the automaticity of reading and selective attention.
Ego Depletion
The theory that self-control or willpower is a limited mental resource that can be used up by demanding tasks, leading to a decrease in subsequent self-control efforts. While the general feeling of fatigue is real, the specific theoretical model involving a depletable 'glucose resource' was largely debunked.
Growth Mindset
The belief that one's abilities and intelligence are not fixed but can be developed and improved through dedication, hard work, and learning from challenges. This contrasts with a fixed mindset, where abilities are seen as inherent and unchangeable.
16 Questions Answered
The trustworthiness of new social science papers depends greatly on the specific paper, as practices have improved over the last decade, but some older, less rigorous practices still persist.
To assess trustworthiness, look for clear, falsifiable hypotheses, a well-defined data analysis plan, a sufficient amount of data for accurate estimates, and a solid, well-worked-out theoretical framework.
Not necessarily; while large datasets lead to higher accuracy, if researchers have significant flexibility in their statistical models, dependent variables, or covariates, they can still engage in p-hacking.
Many fields in psychology focus more on individual 'effects' rather than accumulating knowledge into overarching theories, partly because fixing research methods has been prioritized as an easier task than fixing theories.
While more difficult than in physics, it is feasible to refute psychological theories, especially if they are more precisely formulated mathematically and if researchers actively test competing theories against each other.
Making psychology cumulative requires increased collaboration across research fields and interdisciplinary research, as integration happens across areas, but this process is currently labor-intensive and not sufficiently rewarded in academia.
Researchers should focus on actively seeking and incorporating decent criticism on their ideas, creating an atmosphere where people feel comfortable criticizing each other, and being critical of their own work.
Peer review is limited by the expertise of the reviewers, who may miss subtle issues or specialized flaws, and it often occurs too late in the research cycle, making authors defensive about already completed work.
Importance hacking describes when a research finding is replicable, but its claimed value or significance is exaggerated; for example, a negligible effect found with a very large sample size is presented as a major innovation.
Academia can reduce pressure by changing tenure criteria, such as granting tenure much earlier (e.g., after one year), which allows researchers more freedom to pursue difficult and important work rather than focusing on publication quantity.
Adversarial collaborations often fail because it is challenging for top experts, who have dedicated decades to certain theories, to reach agreement, especially if there is a lack of good faith or willingness to genuinely change their viewpoints.
Yes, admitting error is a skill that should be explicitly taught and trained among scientists, as it is difficult for everyone, and the academic environment often does not provide support for dealing with criticism or mistakes.
Students should be taught the basic principles underlying why certain practices are right or wrong, rather than just strict rules, to understand when flexibility (like sequential analysis) is statistically valid and when it constitutes p-hacking.
The idea that intentionally smiling can have some effect on mood is likely true, but the mechanism might be more conscious (a 'demand effect') rather than an implicit biological process, and some original claims may not replicate.
While the feeling of being tired and giving in to impulses is universally understood, the specific theoretical model of ego depletion involving a depletable 'glucose resource' in the brain has been largely debunked, despite nearly 200 prior studies supporting it.
The IAT is methodologically interesting for measuring how people group things, but it faces significant criticism regarding what it truly measures, its susceptibility to various confounds, and its low test-retest reliability, making it difficult to interpret as a measure of deep, implicit bias.
28 Actionable Insights
1. Cultivate a Culture of Criticism
Actively seek and welcome criticism on your ideas from others, and foster an environment where people feel comfortable providing constructive feedback. This helps identify flaws and improve research, as self-criticism is often insufficient.
2. Implement Early-Stage Red Teaming
Shift the criticism process, like red teaming or peer review, to the early stages of research (e.g., via registered reports) before data collection. This allows researchers to fix fatal flaws in proposals and methods when it still matters, reducing defensiveness and waste.
3. Reduce Publication Pressure for Tenure
Implement tenure systems that grant early tenure (e.g., after one year) to assistant professors, significantly reducing the pressure to publish frequently. This frees researchers to pursue more difficult and important, rather than merely numerous, projects.
4. Increase Scientific Coordination, Collaboration
Actively foster coordination and collaboration among scientists to collectively address research challenges and improve practices. This is crucial for tackling large, complex problems and building cumulative knowledge.
5. Coordinate Replications, Measures, Challenges
Social scientists should coordinate to identify studies needing replication, standardize measurement tools, and commit to long-term, difficult research questions that require collective effort. This ensures a robust knowledge base and addresses critical, complex problems.
6. Promote Interdisciplinary Collaboration
Engage in more cross-field and interdisciplinary research, as integrating diverse expertise (e.g., sociologists, economists) is crucial for addressing complex, large-scale problems and building overarching theories. The academic system should reward this labor-intensive process.
7. Train Scientists to Admit Mistakes
Actively train scientists, especially early in their careers, on how to deal with criticism and admit when they are wrong or have made mistakes. This is a crucial skill that is often overlooked in academic training.
8. Broaden PhD Training to Practical Skills
Expand PhD programs to include training on practical skills essential for a scientific career, such as dealing with research roadblocks, developing new ideas, and managing criticism. This addresses common challenges students face but are not formally taught.
9. Discuss Principles vs. Rewards
Openly discuss the conflict between scientific principles and career reward structures with junior researchers. Encourage them to prioritize doing the right thing, even if it means a lower publication rate, as this aligns with long-term integrity and personal satisfaction.
10. Appoint a Chief Criticizer
For each research project, assign a dedicated “chief criticizer” who is responsible for identifying flaws and takes the blame if errors are found post-publication. This creates a strong incentive for thorough criticism and overcomes social biases.
11. Implement Red Teaming
Form a “red team” specifically tasked with actively trying to break down or criticize the work of another group (the “blue team”) in a collaborative environment. This method, borrowed from programming, helps identify weaknesses and improve outcomes.
12. Evaluate Research Trustworthiness
When assessing a new paper, check for falsifiable hypotheses, a clear data analysis plan, and sufficient data for accurate estimates, alongside a solid theoretical framework. This helps determine the reliability and validity of the findings.
13. Strengthen Theory to Prevent P-Hacking
Develop stronger, more constrained theoretical predictions in research, as this limits flexibility in data analysis and makes p-hacking more difficult. This theoretical component is often important to consider.
14. Increase Transparency by Sharing Code
Share research code publicly to make mistakes visible and normalize the process of finding and fixing errors. This transparency can benefit the entire field by fostering a more open and accountable environment.
15. Sign Peer Reviews for Accountability
Voluntarily sign your peer reviews to foster a sense of personal responsibility for the quality of the critique. This can motivate reviewers to be more thorough and improve the overall peer review system.
16. Diversify Reviewer Expertise
Select peer reviewers with varied and diverse expertise, including those from outside the immediate sub-field, to provide more comprehensive and useful input. This can help catch mistakes that specialized reviewers might miss.
17. Prioritize Direct Dialogue for Disagreement
Instead of formal commentary articles in journals, resolve scientific disagreements through direct, in-person conversations. This informal setting can foster more productive and less defensive conflict resolution.
18. Engage in Adversarial Collaborations
When strong disagreements exist, scientists should engage in adversarial collaborations where they jointly design and execute studies to resolve conflicts. This method aims to reach a shared conclusion or clearly delineate remaining disagreements.
19. Utilize Flexible Pre-registration Methods
Learn and apply advanced pre-registration methods that allow for flexibility in data analysis while maintaining rigor. This addresses the common issue of unforeseen data outcomes and reduces the need to deviate from pre-specified plans.
20. Combine Exploratory and Confirmatory Analysis
Conduct exploratory analysis on a subset of data, then test the most promising hypotheses on a separate, “held-in-vault” confirmatory dataset. This allows for broad exploration while maintaining statistical rigor.
21. Measure Multiple Outcomes with Caution
Collect data on multiple outcome measures to gain a comprehensive understanding, but interpret findings cautiously, especially when only some measures show significance. Use such discrepancies as opportunities for deeper exploration rather than selective reporting.
22. Understand Underlying Principles of Methods
Focus on understanding the fundamental principles behind research methods and ethical practices, rather than mindlessly following rules or avoiding practices that merely “look like” problematic ones. This enables informed decision-making and appropriate flexibility.
23. Share Null Results to Prevent Bias
Actively share null results to combat the “file drawer problem,” which otherwise leads to a literature filled with false positives and flukes. This transparency is crucial for an accurate scientific record.
24. Guard Against Importance Hacking
Be critical of research findings that replicate but whose significance or value is overstated or misinterpreted. Ensure that the claimed meaning and importance of results are genuinely supported by the data, not just statistical significance.
25. Adopt a Growth Mindset
View your own performance and abilities as something that can improve over time with effort and learning, rather than fixed traits. This perspective is crucial for continuous development and resilience, especially in challenging fields like science.
26. Continuously Teach Growth Mindset
Integrate the teaching of a growth mindset into education and training programs, reinforcing it consistently rather than as a one-off intervention. This sustained approach can lead to more significant and lasting positive effects.
27. Encourage Specialization in Science
Promote greater specialization within scientific fields, acknowledging that expertise in areas like programming, measurement, or statistics requires dedicated training and time. This can lead to higher quality work and fewer mistakes.
28. Interpret IAT with Caution
Be highly cautious and critical when interpreting results from the Implicit Association Test (IAT), acknowledging its methodological complexities and potential confounds. Do not assume it directly measures deep-seated implicit biases like racism without clear communication of its limitations.
6 Key Quotes
Studying physics is child play, but studying child play, on the other hand.
Daniel Lakens (attributing Einstein)
Criticizing somebody can be a great act of support. Because if somebody is going in the wrong path, you need to point it out.
Daniel Lakens
We create an environment where people have to look smart all the time.
Daniel Lakens
We have currently built in criticism in a very unfortunate point in the research cycle, namely where somebody already collected all the data, they wrote up their entire paper, then we send it off to some journal and the journal finds peer reviewers. Well, if you would think about how to best organize this system, this is not the moment to criticize people.
Daniel Lakens
The ego depletion effect is also an extremely important research finding for meta scientific reasons, not for the finding it was supposed to be. It tells us that we can have almost 200 studies in the scientific literature, that are part of a meta analysis. And we thought this is great, large effect, super relevant. And it turns out it was all nothing.
Daniel Lakens
I think the progress typically goes in one direction. So you rarely see a society fall back to a system where there's widespread corruption... And similarly, I don't really see us slip back into a system where people are p-hacking because the awareness is just there and you just feel bad about doing it now, whereas before you didn't realize.
Daniel Lakens
2 Protocols
Red Teaming Research
Daniel Lakens- Create a 'blue team' responsible for developing the research or software.
- Form a dedicated 'red team' whose role is to actively criticize and attempt to break down the work of the blue team.
- Ensure the red team operates in a collaborative environment, with the goal of improving the blue team's final product.
- Staff the red team with diverse expertise, including perspectives not typically found in standard peer review.
- Implement the red teaming process much earlier in the research cycle, ideally before data collection, to prevent mistakes when they are easier and less costly to fix.
Improving Science Through Consensus Meetings
Daniel Lakens- Convene experts from a specific scientific field (e.g., once a decade, as in physics).
- Discuss and establish consensus on what is currently known and what remains unknown within the field.
- Identify critical research questions or tasks that require collective effort, such as specific replication studies, the development of standardized measurement tools, or long-term, difficult research projects.
- Coordinate efforts by assigning different groups or regions to specific tasks, leveraging shared resources.
- Actively build in and coordinate for a modest amount of desired variability and disagreement, rather than solely striving for complete consensus.