The phrase “correlation does not imply causation” is often used in statistics to point out that correlation between two variables does not necessarily mean that one variable causes the other to occur.
To better understand this phrase, consider the following real-world examples.
Example 1: Ice Cream Sales & Shark Attacks
If we collect data for monthly ice cream sales and monthly shark attacks around the United States each year, we would find that the two variables are highly correlated.
Does this mean that consuming ice cream causes shark attacks?
Not quite. The more likely explanation is that more people consume ice cream and get in the ocean when it’s warmer outside, which explains why these two variables are so highly correlated.
Although ice cream sales and shark attacks are highly correlated, one does not cause the other.
Example 2: Master’s Degrees vs. Box Office Revenue
If we collect data for the total number of Master’s degrees issued by universities each year and the total box office revenue generated by year, we would find that the two variables are highly correlated.
Does this mean that issuing more Master’s degrees is causing the box office revenue to increase each year?
Not quite. The more likely explanation is that the global population has been increasing each year, which means more Master’s degrees are issued each year and the sheer number of people attending movies each year are both increasing in roughly equal amounts.
Although these two variables are correlated, one does not cause the other.
Example 3: Pool Drownings vs. Nuclear Energy Production
If we collect data for the total number of pool drownings each year and the total amount of energy produced by nuclear power plants each year, we would find that the two variables are highly correlated.
Does this mean that increased pool drownings are somehow causing more nuclear energy to be produced?
Not exactly. The more likely explanation is that global population has been increasing, which means more people are drowning in pools and nuclear energy production is becoming more viable each year which explains why it has increased.
Although these two variables are highly correlated, one does not cause the other.
Example 4: Measles Cases vs. Marriage Rate
If we collect data for the total number of measles cases in the U.S. each year and the marriage rate each year, we would find that the two variables are highly correlated.
Does this mean that reduced measles cases is causing lower marriage rates?
Not exactly. Instead, the two variables are independent – modern medicine is causing measles cases to drop and fewer people are getting married due to various reasons each year.
Although these two variables are highly correlated, one does not cause the other.
Example 5: High School Graduates vs. Pizza Consumption
If we collect data for the total number of high school graduates and total pizza consumption in the U.S. each year, we would find that the two variables are highly correlated.
Does this mean that an increased number of high school graduates is leading to more pizza consumption in the United States?
Not quite. The more likely explanation is that U.S. population has been increasing over time, which means that the number of people receiving a high school degree and the total pizza being consumed are both increasing as population increases.
Although these two variables are correlated, one does not cause the other.
Additional Resources
The following tutorials provide additional information about correlation:
An Introduction to the Pearson Correlation Coefficient
Does Causation Imply Correlation?
Correlation vs. Association: What’s the Difference?
What is Considered to Be a “Strong” Correlation?
When Should You Use Correlation?