Correlation vs Causation: Your Guide to Interpreting Health and Wellness Studies
"Hold on to your morning cup of joe, folks. A recent study finds a link between drinking coffee and heart disease. Better skip your next Starbucks run."
This hypothetical headline sounds plausible, and for many of us, it would be sufficient to cause serious concern. But it's a classic example of a misleading reporting tactic that floods our media feeds. These headlines make seemingly plausible connections - between a food item and a health condition, or a lifestyle choice and longevity - only to commit one of the most egregious errors in statistics: mistaking correlation for causation.
So, before you dump your freshly brewed coffee down the sink, let's navigate the murky waters of correlation and causation. By the end of this article, you'll be well-equipped to find the pearls of truth within the sea of health and wellness misinformation.
Correlation and Causation: Close Cousins at Best
In the world of statistics, a "correlation" refers to an association between two or more things. "Causation," as the word implies, suggests that one thing causes another thing. Correlations are generated by a branch of research called Observational Epidemiology, which observes large groups of people, and identifies patterns among them. Causation, on the other hand, can only be generated from Controlled Experiments, which aim to hold all variables constant, except the one being manipulated and analyzed.
To understand how correlations can be misleading, let’s start with an absurd, but real example. Imagine you hear that ice cream consumption is correlated with shark attacks. This is actually true. The more ice cream people consume in a given area, the more likely people in that area are to be attacked by sharks. This is a correlation. It would be an error – obviously – to conclude that ice cream consumption causes shark attacks.
Instead, as you’ve no doubt surmised, the correlation can be explained by a third factor – that both ice cream consumption and shark attacks go up in warm weather, when more people are at the beach. So, correlation? Yes. Causation? Not so much.
Spurious Correlations Abound
There are dozens of comical examples of correlations which are statistically robust but make absolutely no sense. Here are two of my favorites:
- Since 1999, there has been a tight correlation between the number of people who drown in swimming pools and Nicholas Cage movie appearances.
- One infamous study found a strong correlation between per capita cheese consumption and the number of people who died by becoming tangled in their bedsheets.
Source: https://www.tylervigen.com/spurious-correlations
These nonsensical links are obviously not causally related. But when it comes to health and wellness studies, it’s much easier to subtly conflate correlation with cause.
The Problem with Epidemiology
Consider the following headline: “People Who Eat Avocados More Likely to Live to 100.” Sounds plausible, right? Avocados contain healthy fats, which seem to be good for the brain and the heart. Maybe that could prevent cardiovascular and neurological disease, and extend one’s life. While that might be true, here’s the rub: much like our shark attack scenario, this sort of study simply cannot tell us anything about causation.
Perhaps people who can afford avocados are simply more well-off financially, and thus have better access to healthcare, and thus live longer. Or perhaps avocado eaters are more likely to exercise regularly. Or perhaps avocados only grow in warmer climates, and people who live in warmer climates have higher vitamin D levels, and that is associated with longevity. While we can generate endless guesses as to what’s going on, we simply can't know for certain without doing a controlled experiment.
While they are prone to misinterpretation, observational studies can be worthwhile, particularly in scenarios where experimentation isn’t feasible or ethical. For instance, epidemiology is how we identified that smoking is strongly associated with lung cancer. It’s also how contaminated drinking water was linked to cholera. Epidemiology paints with a broad brush, hinting that one thing is linked with another thing. But to confirm whether epidemiological research represents signal, rather than noise, we need a special set of tools to guide us.
Understanding the Bradford-Hill Criteria
To assess the legitimacy of epidemiological claims, we look to a set of criteria established by Sir Austin Bradford Hill, a renowned British medical statistician. These criteria are a guide to help determine if there is a causal link between a particular exposure (like smoking) and an outcome (like lung cancer). The criteria include strength, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, and analogy. Each element contributes to understanding whether a correlation is just a coincidence or if it is likely to be a cause.
For example, if multiple studies consistently show a strong correlation between smoking and lung cancer, this satisfies the strength and consistency criteria. Temporality would require smoking to happen before the development of heart disease, not after. The biological gradient requires that the more one smokes, the higher the risk of lung cancer. Plausibility and coherence require a reasonable biological explanation for why smoking could cause lung cancer. Taken together, these points lead us to the conclusion that smoking does indeed cause lung cancer.
The Bottomline
Next time you come across a health headline that seems too good (or bad) to be true, remember to look for evidence of causation, not just correlation. Sometimes, it's as absurd as swimming pool drownings and Nicholas Cage films. Other times, particularly in the wellness domain, it might take a bit more critical thinking to separate fact from fiction.