In statistics, correlation refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with -1 indicating a perfect negative relationship, 0 indicating no relationship, and 1 indicating a perfect positive relationship.
There are three common ways to measure correlation:
Pearson Correlation: Used to measure the correlation between two continuous variables. (e.g. height and weight)
Spearman Correlation: Used to measure the correlation between two ranked variables. (e.g. rank of a student’s math exam score vs. rank of their science exam score in a class)
Kendall’s Correlation: Used when you wish to use Spearman Correlation but the sample size is small and there are many tied ranks.
This tutorial explains how to find all three types of correlations in Stata.
Loading the Data
For each of the following examples we will use a dataset called auto. You can load this dataset by typing the following into the Command box:
use http://www.stata-press.com/data/r13/auto
We can get a quick look at the dataset by typing the following into the Command box:
summarize
We can see that there are 12 total variables in the dataset.
How to Find Pearson Correlation in Stata
We can find the Pearson Correlation Coefficient between the variables weight and length by using the pwcorr command:
pwcorr weight length
The Pearson Correlation coefficient between these two variables is 0.9460. To determine if this correlation coefficient is significant, we can find the p-value by using the sig command:
pwcorr weight length, sig
The p-value is 0.000. Since this is less than 0.05, the correlation between these two variables is statistically significant.
To find the Pearson Correlation Coefficient for multiple variables, simply type in a list of variables after the pwcorr command:
pwcorr weight length displacement, sig
Here is how to interpret the output:
- Pearson Correlation between weight and length = 0.9460 | p-value = 0.000
- Pearson Correlation between weight and displacement = 0.8949 | p-value = 0.000
- Pearson Correlation between displacement and length = 0.8351 | p-value = 0.000
How to Find Spearman Correlation in Stata
We can find the Spearman Correlation Coefficient between the variables trunk and rep78 by using the spearman command:
spearman trunk rep78
Here is how to interpret the output:
- Number of obs: This is the number of pairwise observations used to calculate the Spearman Correlation Coefficient. Because there were some missing values for the variable rep78, Stata used only 69 (rather than the full 74) pairwise observations.
- Spearman’s rho: This is the Spearman correlation coefficient. In this case, it’s -0.2235, indicating there is a negative correlation between the two variables. As one increases, the other tends to decrease.
- Prob > |t|: This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0649, which indicates there is not a statistically significant correlation between the two variables at α = 0.05.
We can find the Spearman Correlation Coefficient for multiple variables by simply typing more variables after the spearman command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using the stats(rho p) command:
spearman trunk rep78 gear_ratio, stats(rho p)
Here is how to interpret the output:
- Spearman Correlation between trunk and rep78 = -0.2235 | p-value = 0.0649
- Spearman Correlation between trunk and gear_ratio = -0.5187 | p-value = 0.0000
- Spearman Correlation between gear_ratio and rep78 = 0.4275 | p-value = 0.0002
How to Find Kendall’s Correlation in Stata
We can find Kendall’s Correlation Coefficient between the variables trunk and rep78 by using the ktau command:
ktau trunk rep78
Here is how to interpret the output:
- Number of obs: This is the number of pairwise observations used to calculate Kendall’s Correlation Coefficient. Because there were some missing values for the variable rep78, Stata used only 69 (rather than the full 74) pairwise observations.
- Kendall’s tau-b: This is Kendall’s correlation coefficient between the two variables. We typically use this value instead of tau-a because tau-b makes adjustments for ties. In this case, tau-b = -0.1752, indicating a negative correlation between the two variables.
- Prob > |z|: This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0662, which indicates there is not a statistically significant correlation between the two variables at α = 0.05.
We can find Kendall’s Correlation Coefficient for multiple variables by simply typing more variables after the ktau command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using the stats(taub p) command:
ktau trunk rep78 gear_ratio, stats(taub p)
- Kendall’s Correlation between trunk and rep78 = -0.1752 | p-value = 0.0662
- Kendall’s Correlation between trunk and gear_ratio = -0.3753 | p-value = 0.0000
- Kendall’s Correlation between gear_ratio and rep78 = 0.3206 | p-value = 0.0006