In statistics, spurious correlation refers to a correlation between two variables that occurs purely by chance without one variable actually causing the other to occur.
This type of correlation is dangerous because it can sometimes make people think that one variable causes another, when in reality the correlation exists purely by chance.
It turns out that this type of correlation between variables happens all the time in real life.
The following examples share five different real-life examples of spurious correlation.
Example 1: Master’s Degrees vs. Box Office Revenue
If we collect data for the total number of Master’s degrees issued by universities each year and the total box office revenue generated by year, we would find that the two variables are highly correlated.
This doesn’t mean that issuing more Master’s degrees is causing the box office revenue to increase each year.
The more likely explanation is that the global population has been increasing each year, which means more Master’s degrees are issued each year and the sheer number of people attending movies each year are both increasing in roughly equal amounts.
The correlation between the two variables is spurious.
Example 2: Measles Cases vs. Marriage Rate
If we collect data for the total number of measles cases in the U.S. each year and the marriage rate each year, we would find that the two variables are highly correlated.
This doesn’t mean that reduced measles cases is somehow causing lower marriage rates. The two variables are independent.
Modern medicine is simply causing measles cases to drop and fewer people are getting married due to various reasons each year.
The correlation between the two variables is spurious.
Example 3: High School Graduates vs. Donut Consumption
If we collect data for the total number of high school graduates and total donut consumption in the U.S. each year, we would find that the two variables are highly correlated.
This doesn’t mean that an increased number of high school graduates is leading to more donut consumption in the United States.
The more likely explanation is that U.S. population has been increasing over time, which means that the number of people receiving a high school degree and the total donuts being consumed are both increasing as population increases.
This is a spurious correlation.
Example 4: Video Game Sales vs. Nuclear Energy Production
If we collect data for the total video game sales each year around the world and the total energy produced by nuclear power plants, we would find that the two variables are highly correlated.
This doesn’t mean that somehow increased video game sales are leading to increased nuclear energy production.
Instead, more nuclear energy power plants are being built and more video games are being sold as the global population increases each year.
Although both variables increase steadily over time, one is not causing the other. The correlation between the two is spurious.
Example 5: Revenue from Arcades vs. Coal Mining Jobs
If we collect data for the total revenue generated from arcades in the U.S. and total number of coal mining jobs in the U.S., we would find that the two variables are highly correlated.
This doesn’t mean that one variable is causing the other to decrease.
Instead, both arcades and coal mining have become less common over the years which explains why both variables have decreased at roughly the same rate.
The correlation between the two variables is spurious.
Additional Resources
The following tutorials provide real-life examples of other statistical concepts:
Examples of Using Probability in Real Life
Examples of Using Correlation in Real Life
Examples of Using Mean, Median, and Mode in Real Life