Two terms that students often get confused in statistics are R and R-squared, often written R2.
In the context of simple linear regression:
- R: The correlation between the predictor variable, x, and the response variable, y.
- R2: The proportion of the variance in the response variable that can be explained by the predictor variable in the regression model.
And in the context of multiple linear regression:
- R: The correlation between the observed values of the response variable and the predicted values of the response variable made by the model.
- R2: The proportion of the variance in the response variable that can be explained by the predictor variables in the regression model.
Note that the value for R2 ranges between 0 and 1. The closer the value is to 1, the stronger the relationship between the predictor variable(s) and the response variable.
The following examples show how to interpret the R and R-squared values in both simple linear regression and multiple linear regression models.
Example 1: Simple Linear Regression
Suppose we have the following dataset that shows the hours studied and exam score received by 12 students in a certain math class:
Using statistical software (like Excel, R, Python, SPSS, etc.), we can fit a simple linear regression model using “study hours” as the predictor variable and “exam score” as the response variable.
We can find the following output for this model:
Here’s how to interpret the R and R-squared values of this model:
- R: The correlation between hours studied and exam score is 0.959.
- R2: The R-squared for this regression model is 0.920. This tells us that 92.0% of the variation in the exam scores can be explained by the number of hours studied.
Also note that the R2 value is simply equal to the R value, squared:
R2 = R * R = 0.959 * 0.959 = 0.920
Example 2: Multiple Linear Regression
Suppose we have the following dataset that shows the hours studied, current student grade, and exam score received by 12 students in a certain math class:
Using statistical software, we can fit a multiple linear regression model using “study hours” and “current grade” as the predictor variables and “exam score” as the response variable.
We can find the following output for this model:
Here’s how to interpret the R and R-squared values of this model:
- R: The correlation between the actual exam scores and the predicted exam scores made by the model is 0.978.
- R2: The R-squared for this regression model is 0.956. This tells us that 95.6% of the variation in the exam scores can be explained by the number of hours studied and the student’s current grade in the class.
Also note that the R2 value is simply equal to the R value, squared:
R2 = R * R = 0.978 * 0.978 = 0.956
Additional Resources
What is a Good R-squared Value?
A Gentle Guide to Sum of Squares: SST, SSR, SSE