Statistics Intuitions and Social Science Reproducibility (with Stuart Buck)
Spencer Greenberg and Stuart Buck discuss improving math education by focusing on statistics and logic, the nuances of p-values and publishing null results, and the critical need for open science, reproducibility, and better research practices in various fields.
Deep Dive Analysis
12 Topic Outline
Rethinking High School Math Education
Critique of Traditional Math Curriculum Justifications
Ideal High School Math Curriculum: Focus on Statistics
Statistical Literacy and Understanding Science
Understanding P-values and Their Misinterpretations
The Debate Over P-value Cutoffs and Null Results
Introduction to Open Science and Research Reproducibility
Reproducibility Challenges: Examples from Economics and Biology
The Importance of Detailed Methods and Material Sharing
Generalizability of Research Findings and Continuous Experimentation
Challenges in Generalizing International Development Interventions
Reproducibility Crisis in Pharmaceutical Research
7 Key Concepts
P-value
A p-value indicates the probability of observing data as extreme or more extreme than what was found, *if* there is no effect (the null hypothesis is true). It does not directly tell you the probability that a result is true or that an effect exists.
Correlation vs. Causation
This concept differentiates between two events or variables that tend to occur together (correlation) and one event directly causing the other (causation). Confusing these two is a common error in public discourse and understanding the world.
Base Rate Neglect
This is a cognitive bias where people tend to ignore the overall prevalence or 'base rate' of an event when estimating the probability of a specific outcome. It often leads to incorrect conclusions, such as misinterpreting medical test results without considering the disease's rarity.
Transfer of Learning
This educational concept suggests that skills learned in one domain can be applied or 'transferred' to improve performance in another, seemingly unrelated domain. However, rigorous studies often show that such transfer is limited and direct training is usually more effective.
Statistically Significant
A term used to describe a research result where the p-value is below a predetermined threshold, typically 0.05. This phrase can be misleading as it implies importance or truth, despite evidence existing on a continuum and small differences around the cutoff being statistically insignificant themselves.
Reproducibility (in science)
Reproducibility refers to the ability of independent researchers to obtain the same results when conducting an experiment again, ideally using the same data, code, and methods. Failures in reproducibility highlight issues with research reliability, methodology, or theoretical understanding.
Publication Bias
This is the tendency for scientific journals to preferentially publish studies that report positive or 'statistically significant' findings over those with null or negative results. This bias can distort the scientific literature, making certain effects appear more robust or prevalent than they truly are.
9 Questions Answered
Current high school math curricula, particularly geometry and advanced topics, are often not useful for most students in understanding the world or daily life, leading to frustration and a lack of practical application.
A more beneficial curriculum would focus on conceptual understanding of statistics and data analysis, including probability, correlation vs. causation, Bayesian interpretation, and understanding basic distributions like mean and median.
A p-value indicates the probability of observing data as extreme or more extreme than what was found, assuming there is no true effect (the null hypothesis is true).
People often incorrectly interpret a p-value as the probability that a result is true or that an effect exists, whereas it's actually the probability of the data given no effect, which is the inverse of what people typically want to know.
A strict cutoff creates an artificial dichotomy where results just below the line are considered 'significant' and those just above are not, despite the evidence being a continuum and small differences in p-values often being statistically insignificant themselves.
Null results are valuable when they address a theoretically interesting question, especially if there's a strong prior belief in a positive effect, or if they challenge an intervention widely believed to work.
Reproducibility means that if an experiment is repeated, the same results should be obtained. It's crucial for building reliable scientific knowledge and ensuring that evidence-based decisions are made on trustworthy findings.
Factors include poor coding practices, lack of data/code sharing, subtle unstated methodological differences (e.g., wood shavings in mouse cages, stirring speed in cell cultures), publication bias towards positive results, and over-generalization of findings from specific populations.
One approach is to integrate lightweight randomized control trials (RCTs) directly into the deployment of interventions, continuously testing and improving them on the specific target population, rather than relying on generalization from studies done elsewhere.
12 Actionable Insights
1. Integrate Continuous RCTs
Weave lightweight randomized control trials (RCTs) into the deployment of interventions (e.g., cash transfers, digital apps) to continuously collect high-quality data on the target population and iterate for improvement, rather than relying on generalization from other studies.
2. Question Your Successes
Adopt the practice of questioning your successes as much as your failures, like good poker players, to determine if your decision-making process was sound or if you merely got lucky.
3. Prioritize Core Math & Logic
Restructure high school math curriculum to prioritize basic statistics, data analysis, and direct logic instruction, as these concepts are more useful for understanding the world, news, and scientific claims than esoteric geometry or indirect logic training.
4. Master Core Statistical Concepts
Learn and understand fundamental statistical concepts like the mean, median, probability distributions (e.g., standard deviation), and the difference between correlation and causation, as these are critical for making sense of the world and scientific information.
5. P-values for Sampling Error
Use p-values as a tool to reasonably rule out sampling error, understanding that a very low p-value suggests the result is unlikely due to random noise from particular participants or data points, but does not imply the result’s truth or importance.
6. Publish Informative Null Results
Prioritize publishing null results for theoretically interesting questions, especially those that contradict prior positive findings or widely held beliefs, to advance the field and avoid pursuing dead ends.
7. Fund High-Value Science
Guide science funding decisions towards research that yields fundamental theoretical results, applied results with immediate utility (e.g., curing diseases), or tools and methods that accelerate future scientific progress.
8. Rigorously Double-Check Work
When producing significant work, especially that which will be widely published or cited, rigorously double-check all calculations and data analysis to minimize errors, acknowledging the increased responsibility that comes with greater impact.
9. Adopt Unit Tests in Research
Scientists should implement unit tests (writing code specifically to test their analytical code) to catch bugs and improve the reliability of their computational work, a best practice from software engineering.
10. Detail Experimental Methods
Fully disclose experimental methods in scientific papers with as much detail as possible, recognizing that subtle, seemingly minute differences in procedures can dramatically affect results and are crucial for reproducibility.
11. Avoid Grandiose Generalizations
Scientists should approach findings with humility, avoiding grandiose pronouncements about general human behavior based on limited populations or contexts, and acknowledge the potential for results to be highly specific.
12. Assess Implementation Quality
When evaluating interventions, recognize that a poor or low-quality implementation of an otherwise effective program can lead to a failure to find an effect, making it crucial to account for implementation quality.
5 Key Quotes
If you want people to learn math, teach them math. If you want people to learn music, teach them music and justify it on its own terms, not because of some benefit to something else that you could have been teaching directly.
Stuart Buck
The difference between statistically significant and statistically insignificant is statistically insignificant.
Stuart Buck
If psychology is that brittle, we're screwed.
Spencer Greenberg
But the best poker players will also go take a successful hand where they won. And instead of just feeling good about it, they say, let's go back and look. Because maybe, maybe I made a decision that actually was not a good decision at the time. And I just got lucky.
Stuart Buck
What we want is not just more publications. What we want is more publications, particularly in that area, you know, cell biology, et cetera, or publications that lead to greater understanding of how the human body works and ultimately greater ability to prevent diseases or cure diseases or address aging and issues like that.
Stuart Buck