In statistics, correlation refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with the following interpretations:
- -1: a perfect negative relationship between two variables
- 0: no relationship between two variables
- 1: a perfect positive relationship between two variables
One special type of correlation is called Spearman Rank Correlation, which is used to measure the correlation between two ranked variables. (e.g. rank of a student’s math exam score vs. rank of their science exam score in a class).
This tutorial explains how to calculate the Spearman rank correlation between two variables in Python
Example: Spearman Rank Correlation in Python
Suppose we have the following pandas DataFrame that contains the math exam score and science exam score of 10 students in a particular class:
import pandas as pd #create DataFrame df = pd.DataFrame({'student': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], 'math': [70, 78, 90, 87, 84, 86, 91, 74, 83, 85], 'science': [90, 94, 79, 86, 84, 83, 88, 92, 76, 75]})
To calculate the Spearman Rank correlation between the math and science scores, we can use the spearmanr() function from scipy.stats:
from scipy.stats import spearmanr
#calculate Spearman Rank correlation and corresponding p-value
rho, p = spearmanr(df['math'], df['science'])
#print Spearman rank correlation and p-value
print(rho)
-0.41818181818181815
print(p)
0.22911284098281892
From the output we can see that the Spearman rank correlation is -0.41818 and the corresponding p-value is 0.22911.
This indicates that there is a negative correlation between the science and math exam scores.
However, since the p-value of the correlation is not less than 0.05, the correlation is not statistically significant.
Note that we could also use the following syntax to just extract the correlation coefficient or the p-value:
#extract Spearman Rank correlation coefficient
spearmanr(df['math'], df['science'])[0]
-0.41818181818181815
#extract p-value of Spearman Rank correlation coefficient
spearmanr(df['math'], df['science'])[1]
0.22911284098281892
Additional Resources
How to Calculate Spearman Rank Correlation in R
How to Calculate Spearman Rank Correlation in Excel
How to Calculate Spearman Rank Correlation in Stata