#269 - Good vs. bad science: how to read and understand scientific studies

Sep 4, 2023 Episode Page ↗

Overview

This episode, a rebroadcast of AMA #30, features Peter Attia and Bob Kaplan discussing how to critically read and interpret scientific studies. They cover study types, clinical trials, common biases, statistical concepts, and Peter's personal method for analyzing research papers.

At a Glance

56 Insights

1h 50m Duration

13 Topics

18 Concepts

Deep Dive Analysis

13 Topic Outline

Process for a Study: From Idea to Design to Execution

Types of Studies: Observational vs. Experimental

Understanding Observational Studies: Case Reports, Case Series, and Cohort Studies

Phases of Human Clinical Trials: Safety, Efficacy, and Approval

Biases in Observational Studies: Healthy User, Recall, and Performance

Rigorous Experimental Studies: Randomization, Blinding, and Outcomes

Statistical Concepts: Power, P-values, and Significance

Measuring Effect Size: Relative vs. Absolute Risk, Hazard Ratios, and NNT

Interpreting Confidence Intervals

Reasons for Stopping a Study Prematurely: Safety, Benefit, or Futility

Publication Bias and Strategies to Combat It

Journal Prestige and the Impact Factor

Peter's Process for Reading Scientific Papers

18 Key Concepts

Null Hypothesis

The default position in science, stating there is no relationship or difference between two phenomena being studied. The goal of an experiment is often to try to falsify this hypothesis.

Power Analysis

A crucial step in experimental design to determine the minimum number of subjects needed in a study to detect a statistically significant difference, if one truly exists, with a specified level of certainty.

Institutional Review Board (IRB)

An ethics committee that must approve studies involving human or animal subjects to ensure the ethical conduct and safety of the participants. This approval is required before a study can begin.

Observational Studies

Studies where researchers observe subjects and measure variables of interest without intervening or manipulating any variables. They can identify associations but cannot establish causality.

Experimental Studies

Studies where researchers actively intervene by manipulating one or more variables (treatments) and observing the effect on an outcome, allowing for the establishment of causality.

Meta-analysis

A statistical technique that combines data from multiple independent studies addressing the same question to derive a single, more precise estimate of an effect. Its quality depends entirely on the quality of the included studies, following the 'garbage in, garbage out' principle.

Healthy User Bias

A common bias in observational studies where individuals who engage in one healthy behavior (e.g., not eating meat) are also more likely to engage in other healthy behaviors (e.g., exercise, not smoking), making it difficult to isolate the effect of a single variable.

Recall Bias (Information Bias)

A bias in studies, particularly nutritional epidemiology, where subjects' ability to accurately remember past behaviors (e.g., food consumption) is poor, leading to inaccurate data. This makes it challenging to draw reliable conclusions.

Performance Bias (Hawthorne Effect)

A bias where subjects change their behavior simply because they know they are being observed or are part of a study, or when investigators' knowledge of treatment assignment influences their interaction with subjects. This can subtly alter study outcomes.

P-value (Alpha)

The probability that an observed effect in a study is due to random chance (a false positive). A p-value of 0.05 or less is typically considered the threshold for statistical significance, meaning there's a 5% or less chance the result is a false positive.

Statistical Power

The probability of correctly detecting a true effect if it exists (1 minus the false negative rate). Typically, studies aim for 80-90% power, meaning they have an 80-90% chance of finding a real effect if it's there.

Absolute Risk

The actual risk of an event occurring in a population or group (e.g., 5 heart attacks per 1,000 people). This provides a direct measure of how common an event is.

Relative Risk

The ratio of the risk of an event in an exposed group compared to an unexposed group, often expressed as a percentage increase or decrease. It can be misleading without knowing the absolute risk, as a large relative risk can still represent a small absolute change.

Hazard Ratio

A measure of the relative risk of an event occurring at any point in time during a study, capturing the temporal aspect of risk. A hazard ratio of 1 means no difference, less than 1 means reduced risk, and greater than 1 means increased risk.

Number Needed to Treat (NNT)

The average number of patients who need to be treated to prevent one additional adverse event. It is calculated as 1 divided by the absolute risk reduction and helps assess the clinical significance of an intervention.

Confidence Interval

A range of values within which the true population parameter (e.g., hazard ratio) is likely to lie. If the interval for a ratio (like hazard ratio) includes 1, or for a difference includes 0, the result is not statistically significant. A tighter interval indicates less uncertainty.

Publication Bias

The tendency for studies with positive or statistically significant results to be more likely to be published than those with negative or null results, leading to a skewed representation of scientific evidence and an incomplete body of knowledge.

Impact Factor

A metric used to gauge the relative importance or influence of a scientific journal, calculated as the average number of citations received by articles published in that journal over a specific period (typically one year). Higher impact factors generally indicate more prestigious and selective journals.

11 Questions Answered

What is the process for a study to go from an idea to design and execution?

It starts with a hypothesis (often a null hypothesis), followed by experimental design, power analysis to determine subject numbers, Institutional Review Board (IRB) approval, defining primary/secondary outcomes, developing a statistical plan, and pre-registering the study, all while securing funding.

What are the different types of studies and how do they differ?

Studies broadly fall into observational (case reports, case series, cohort studies) and experimental (randomized controlled trials, non-randomized trials), with meta-analyses and systematic reviews summarizing these. Observational studies identify associations, while experimental studies can establish causality through intervention.

How do the different phases of human clinical trials work for drugs?

Phase 1 focuses on dose escalation and safety in a small group; Phase 2 continues safety evaluation and looks for efficacy in a larger, often open-label group; Phase 3 is a large, rigorous, often randomized, blinded, placebo-controlled trial to confirm efficacy and safety for approval; Phase 4 involves post-marketing surveillance and exploring new indications.

What are common pitfalls or biases to watch out for in observational studies?

Key pitfalls include selection bias (e.g., healthy user bias), information or recall bias (especially in nutritional epidemiology), and confounding variables that can create spurious associations.

What factors contribute to the rigor of an experimental study?

Rigor is enhanced by proper randomization, blinding (single or double), having a control group, adequate sample size (power), clear primary and secondary outcomes, appropriate duration, generalizability to the target population, and transparent funding/conflict of interest declarations.

What does it mean when a study is statistically significant?

Statistical significance means that the observed result is unlikely to have occurred by random chance, typically indicated by a p-value of 0.05 or less, leading to the rejection of the null hypothesis. However, it does not necessarily mean the effect is clinically meaningful.

How do researchers measure effect size, and what's the difference between relative and absolute risk?

Effect size can be measured using relative risk, absolute risk, hazard ratios, and number needed to treat (NNT). Relative risk describes the proportional change in risk, while absolute risk is the actual difference in event rates between groups. Absolute risk is crucial for understanding clinical impact.

How should confidence intervals be interpreted?

A confidence interval provides a range within which the true effect size (e.g., hazard ratio) is likely to lie. If the interval for a ratio (like hazard ratio) includes 1, or for a difference includes 0, the result is not statistically significant. A tighter interval indicates less uncertainty.

Why might a study be stopped before its completion?

Studies can be stopped prematurely for three main reasons: safety concerns (the treatment is causing harm), overwhelming benefit (the treatment is so effective it's unethical to withhold it from the control group), or futility (it's clear no significant benefit will be found even if the study continues).

Why are only a fraction of studies ever published, and how can publication bias be combated?

Many studies, especially those with negative or null results, are not published due to publication bias, leading to an incomplete body of evidence. This can be combated through pre-registration of trials on public databases (like clinicaltrials.gov) and publishing formats like 'registered reports' where protocols are peer-reviewed and provisionally accepted before data collection.

Why are certain scientific journals more respected than others?

Journal prestige is often determined by its 'impact factor,' which is a measure of how frequently its articles are cited by other researchers. Journals with higher impact factors are generally considered more influential and selective.

56 Actionable Insights

1. Develop Scientific Literacy

Emphasize scientific literacy to better understand research, distinguish signal from noise, and critically evaluate studies for rigor and potential misrepresentation.

2. Commit to Rigorous Paper Reading

If you choose to engage with science, commit to rigorously reading scientific papers, understanding it requires effort and attention to detail, but improves with practice.

3. Critically Evaluate Media Science Reports

Exercise caution and critical thinking when consuming science information from social media or news, as reporters may lack the necessary analytical skills to accurately interpret studies.

4. Distinguish Primary from Secondary Outcomes

Pay close attention to a study’s pre-registered primary and secondary outcomes, understanding that failing the primary outcome typically renders a study null, regardless of secondary findings.

5. Clinical vs. Statistical Significance

Always differentiate between statistical significance (a study’s success in rejecting the null hypothesis) and clinical significance (whether the observed effect size is practically relevant or meaningful).

6. Demand Absolute Risk Data

When evaluating study results, always demand to know the absolute risk alongside the relative risk, as relative risk alone can be misleading and insufficient for understanding true impact.

7. Utilize Number Needed to Treat (NNT)

Calculate the Number Needed to Treat (NNT) by dividing one by the absolute risk reduction to understand how many people must be treated to prevent one event, informing the practical value of an intervention.

8. Prioritize Low NNT Interventions

When evaluating interventions, prioritize those with a low Number Needed to Treat (NNT), generally below 100, as this indicates a more impactful and efficient treatment.

9. Scrutinize Meta-Analysis Components

Do not accept a meta-analysis as gospel without examining each of its constitutive studies, as a meta-analysis of poor-quality studies will yield poor results.

10. Evaluate Study Generalizability

Consider the size, duration, and patient population of a study to determine if its results are generalizable and relevant to your specific interests or patient context.

11. Check Funding & Conflicts of Interest

Always examine who funded a trial and the declared conflicts of interest of the authors, as these can subtly influence study design, reporting, or interpretation.

12. Recognize Publication Bias

Understand that publication bias exists, where studies with negative or null results are less likely to be published, potentially skewing the available scientific literature.

13. Value Negative Research Findings

Recognize that negative or null research findings are just as important as positive ones for the advancement of knowledge, as they prevent wasted effort and inform future research directions.

14. Support Study Pre-Registration

Advocate for and prioritize studies that are pre-registered on platforms like clinicaltrials.gov, as this practice makes it harder for investigators to withhold negative results and combats publication bias.

15. Use Registered Reports for Unbiased Publication

Consider publishing or seeking out “registered reports” where the study protocol is peer-reviewed and provisionally accepted before data collection, ensuring publication regardless of the outcome and combating bias.

16. Prioritize Peer-Reviewed Research

When seeking scientific information, prioritize peer-reviewed publications as they represent the highest standard of vetting by experts in the field, unlike non-peer-reviewed content.

17. Confirm Adequate Study Power

When a study, especially one with a null outcome, is presented, always question if it was adequately powered to detect a meaningful difference, as underpowered studies can miss real effects.

18. Understand P-Value as False Positive Rate

Understand that a p-value represents the probability of observing a result as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true, essentially a false positive rate.

19. Understand Confidence Intervals as Uncertainty

View confidence intervals as “uncertainty intervals” that indicate the range where the true population statistic likely lies, rather than a strict probability of containing the true mean.

20. Check Confidence Interval for Unity

To quickly assess statistical significance, check if the confidence interval for a hazard or odds ratio crosses one; if it does, the result is not statistically significant.

21. Value Tighter Confidence Intervals

Recognize that tighter confidence intervals indicate less uncertainty and more precision in the estimated effect, increasing confidence in the study’s findings.

22. Interpret Hazard Ratios

Familiarize yourself with how to calculate and interpret hazard ratios (e.g., 0.82 means an 18% reduction, 2.2 means a 120% increase) to understand the temporal risk of events in clinical trials.

23. Beware Healthy User Bias

When evaluating observational studies, especially in health, be aware of the “healthy user bias” where people making one health-conscious choice often make many others, confounding results.

24. Distrust Food Frequency Questionnaires

Be highly skeptical of nutritional epidemiology studies that rely on food frequency questionnaires due to their inherent clunkiness, recall bias, and metaphysical impossibility of accurate recall.

25. Be Cautious with Causality

When interpreting observational studies, recognize that while patterns will be seen, establishing causality from these patterns is difficult and requires careful consideration.

26. Acknowledge Hawthorne Effect

Understand that observation itself can change behavior (the Hawthorne effect), meaning people may alter their actions simply because they know they are being watched or recorded.

27. Account for Confounding Variables

When interpreting studies, especially observational ones, identify potential confounding variables (e.g., age, sex, smoking) that can affect results and obscure true causal relationships.

28. Caution with Homogeneous Extrapolation

Be cautious when extrapolating results from studies conducted in homogeneous populations (e.g., men only) to broader, more heterogeneous populations (e.g., women), as utility may differ.

29. Understand Multi-Site Study Trade-offs

Recognize that while multi-site studies offer heterogeneity and generalizability, they are harder to control and can introduce bias if sites are not run consistently.

30. Analyze Adverse Events Thoroughly

When evaluating a trial, pay close attention to the frequency, severity, and distribution of adverse events in all groups, not just the primary outcomes.

31. Know Reasons for Early Study Stops

Be aware that clinical trials can be stopped prematurely for three main reasons: safety concerns, overwhelming benefit, or futility (no chance of finding a significant effect).

32. Understand Journal Impact Factor

Recognize that journal prestige is often indicated by its impact factor, a yearly metric reflecting the average number of citations per article published in that journal over a given period.

33. Read Abstract First

When reading a scientific paper, start with the abstract to quickly determine if the paper’s content is relevant and interesting enough to warrant further reading.

34. Tailor Reading to Familiarity

Adjust your reading approach based on your familiarity with the subject matter; read the introduction if unfamiliar, but skip it if you already have a good grasp of the background.

35. Scrutinize Methods Section

After the abstract (and introduction if needed), go directly to the methods section to understand the study’s design, randomization, interventions, subject numbers, and specific procedures.

36. Start Results with Figures/Legends

When reviewing the results, begin by examining the figures and tables along with their legends, as well-designed figures should be standalone and convey key findings concisely.

37. Read Discussion Last

Read the discussion section last, after forming your own opinions on the study’s strengths, weaknesses, and remaining questions, to compare your thoughts with the authors’ interpretations.

38. Practice Regular Paper Reading

Improve your ability to understand scientific literature through consistent repetition, such as reading a scientific paper every week.

39. Start with Null Hypothesis

When approaching scientific inquiry, begin by assuming no relationship between two phenomena (the null hypothesis) to frame your investigation cleanly.

40. Use Randomized Controlled Experiments

To elegantly test a hypothesis, design your experiment as a randomized controlled trial, and blind it if possible, to minimize bias.

41. Implement Double Blinding for Rigor

To enhance study rigor and minimize bias, implement double blinding where neither subjects nor investigators know who is receiving treatment or placebo; single blinding (subjects don’t know) is a minimum.

42. Ensure Rigorous Randomization

For experimental studies, prioritize rigorous randomization, as it is crucial for making sense of results and minimizing bias, even if non-randomized studies aren’t useless.

43. Mitigate Performance Bias in RCTs

In lifestyle-based randomized controlled trials, ensure both treatment and control groups receive the exact same amount of attention, coaching, and advice to eliminate performance bias.

44. Aim to Falsify Hypotheses

When designing experiments, adopt a rigorous approach aimed at falsifying your hypothesis, rather than solely seeking to confirm it, to ensure robust scientific inquiry.

45. Perform Power Analysis

Before conducting a study, determine the necessary number of subjects by performing a power analysis to ensure the experiment is adequately powered to detect a true difference.

46. Pre-Register Study Protocols

Before conducting a study, define primary and secondary outcomes, get the protocol approved, develop a statistical plan, and pre-register the study to enhance transparency and rigor.

47. Secure IRB Approval

For any study involving human or animal subjects, secure Institutional Review Board (IRB) approval to ensure the ethical conduct of the study.

48. Secure Research Funding

Ensure adequate funding is secured in parallel with study design and approval processes, as research requires financial resources.

49. Value Case Reports for Hypotheses

Understand that individual case reports, while not generalizable, serve as valuable hypothesis-generating observations that can kickstart larger trials or research careers.

50. Avoid Underpowered Studies

Do not conduct underpowered experiments, as they lack sufficient subjects to detect a real difference, often leading to null results and wasted effort.

51. Avoid Overpowered Studies

Be cautious of overpowered studies, which may enroll more subjects than necessary and detect statistically significant but clinically irrelevant effects.

52. Target 80-90% Study Power

In study design, aim for 80% to 90% power (meaning a 10-20% false negative rate) to ensure the study has a high probability of detecting a true effect if one exists.

53. Larger Effects Need Fewer Subjects

Recognize that studies designed to detect larger effect sizes (bigger differences between groups) require fewer subjects to achieve adequate statistical power.

54. Prioritize Figures in Scientific Writing

When writing a scientific paper, start by creating the figures, tables, and their legends first, as this helps clarify the core findings and structure the rest of the manuscript.

55. Support Ad-Free Content

If you value content provided without paid ads, consider becoming a member to support the creators, as their work is often made possible by members.

56. Deepen Knowledge with Premium Membership

To take your knowledge of health and wellness to the next level, consider a premium membership, which aims to provide members with much more value than the subscription price.

4 Key Quotes

a thousand sows ears makes not a pearl necklace.
James Yang (quoted by Peter Attia)

garbage in, garbage out.
Peter Attia

We're number two. We try harder.
Bob Kaplan

The study didn't turn out the way we wanted it to.
Ivan Franz (quoted by Peter Attia)

1 Protocols

Peter Attia's Process for Reading a Scientific Paper

Peter Attia

Read the abstract first to determine interest in the paper.
If unfamiliar with the subject matter, read the introduction; otherwise, skip it.
Go directly to the methods section to understand the experimental details (subjects, randomization, interventions, measurements, etc.).
Examine the results section, starting with figures and tables and their legends, then read the prose for additional context.
Finally, read the discussion section to compare personal thoughts on the study's strengths and weaknesses with the authors' perspectives.

15 Key Numbers

0.05 or less

P-value threshold for statistical significance Represents a 5% or less chance that the observed effect is a false positive.

80% to 90%

Typical statistical power for a study The probability of correctly detecting a true effect if it exists.

10% to 20%

Typical false negative rate (Beta) The probability of failing to detect a true effect when one exists.

25%

Relative risk increase of breast cancer in WHI study For women receiving estrogen and synthetic progesterone treatment.

0.1% (1 per 1,000 women)

Absolute risk increase of breast cancer in WHI study Increase from 4 cases per 1,000 to 5 cases per 1,000.

1,000

Number Needed to Treat (NNT) for a 20% relative risk reduction (5% to 4% absolute risk) Number of people to treat to prevent one additional event.

100

Number Needed to Treat (NNT) for a 1% absolute risk reduction (4% to 3%) Number of people to treat to prevent one additional event.

Number Needed to Treat (NNT) for a 2% absolute risk reduction (4% to 2%) Number of people to treat to prevent one additional event.

Number Needed to Treat (NNT) for a 3% absolute risk reduction (4% to 1%) Number of people to treat to prevent one additional event.

98%

Percentage of journals with an impact factor less than 10 Out of approximately 13,000 scientific journals.

95%

Percentage of journals with an impact factor less than 5 Out of approximately 13,000 scientific journals.

Approximately 50%

Percentage of journals with an impact factor less than 2 Out of approximately 13,000 scientific journals.

74.699

New England Journal of Medicine Impact Factor (2019) Based on nearly 350,000 citations.

The Lancet Impact Factor (2019) Based on approximately 250,000 citations.

292

Cancer Journal for Clinicians Impact Factor (2019) Skewed due to high citation of few articles, primarily a global cancer statistics report.