Regression models are used to quantify the relationship between one or more predictor variables and a response variable.
Whenever we fit a regression model, we want to understand how well the model is able to use the values of the predictor variables to predict the value of the response variable.
Two metrics we often use to quantify how well a model fits a dataset are the mean squared error (MSE) and the root mean squared error (RMSE), which are calculated as follows:
MSE: A metric that tells us the average squared difference between the predicted values and the actual values in a dataset. The lower the MSE, the better a model fits a dataset.
MSE = Σ(ŷi – yi)2 / n
where:
- Σ is a symbol that means “sum”
- ŷi is the predicted value for the ith observation
- yi is the observed value for the ith observation
- n is the sample size
RMSE: A metric that tells us the square root of the average squared difference between the predicted values and the actual values in a dataset. The lower the RMSE, the better a model fits a dataset.
It is calculated as:
RMSE = √Σ(ŷi – yi)2 / n
where:
- Σ is a symbol that means “sum”
- ŷi is the predicted value for the ith observation
- yi is the observed value for the ith observation
- n is the sample size
Notice that the formulas are nearly identical. In fact, the root mean squared error is just the square root of the mean squared error.
RMSE vs. MSE: Which Metric Should You Use?
When assessing how well a model fits a dataset, we use the RMSE more often because it is measured in the same units as the response variable.
Conversely, the MSE is measured in squared units of the response variable.
To illustrate this, suppose we use a regression model to predict the number of points that 10 players will score in a basketball game.
The following table shows the predicted points from the model vs. the actual points the players scored:
We would calculate the mean squared error (MSE) as:
- MSE = Σ(ŷi – yi)2 / n
- MSE = ((14-12)2+(15-15)2+(18-20)2+(19-16)2+(25-20)2+(18-19)2+(12-16)2+(12-20)2+(15-16)2+(22-16)2) / 10
- MSE = 16
The mean squared error is 16. This tells us that the average squared difference between the predicted values made by the model and the actual values is 16.
The root mean squared error (RMSE) would simply be the square root of the MSE:
- RMSE = √MSE
- RMSE = √16
- RMSE = 4
The root mean squared error is 4. This tells us that the average deviation between the predicted points scored and the actual points scored is 4.
Notice that the interpretation of the root mean squared error is much more straightforward than the mean squared error because we’re talking about ‘points scored’ as opposed to ‘squared points scored.’
How to Use RMSE in Practice
In practice, we typically fit several regression models to a dataset and calculate the root mean squared error (RMSE) of each model.
We then select the model with the lowest RMSE value as the “best” model because it is the one that makes predictions that are closest to the actual values from the dataset.
Note that we can also compare the MSE values of each model, but RMSE is more straightforward to interpret so it’s used more often.
Additional Resources
Introduction to Multiple Linear Regression
RMSE vs. R-Squared: Which Metric Should You Use?
RMSE Calculator